Improving data quality for eCommerce organizations

Nowadays, many eCommerce organizations aim to keep up with the newest technologies. This allows them to make business processes more efficient, gain more revenue and sell more products. By actively, early adapting to new technologies, you also gain competitive advantage which could contribute to the abovementioned factors. Artificial intelligence is an example of a new technology that provides great opportunities: Artificial intelligence provides new possibilities to increase your business processes in efficiency: by the use of softwares that run on smart algorithms, AI can save your organization a lot of valuable manhours and unnecessary costs. Sounds ideal, right?

Yes. But it is a little more complicated. Organizations are frequently faced with the challenges that come along with these new technologies. In order to be able to automate business processes that otherwise should be executed manually, you need to be able to properly use your product data. This is a problem for many organizations: they receive product data from suppliers or data bases, and the data is inconsistent with their own data model. For example, your supplier may describe the measurements of a product in inches, while your data model measures them in centimeters. Another example is that your supplier delivers incomplete data: this complicates things for your own store, since you still want to be able to sufficiently inform your customer. When customers are sufficiently informed, it enables them to make a well-considered choice, which makes it an important aspect of the customer experience. So, how can you get more out of your data?

How to get more out of your data

The answer is simple: improve your data quality and quantity. When your data quality and quantity is optimal, doors open to other possibilities. It appears that most organizations detect data quality issues when reported by employees or customers, but that just a minority of the organizations is proactively searching for these issues. Actively trying to solve these issues could be beneficial for your organization. In this way, you can stay ahead of the competition. These are some examples of issues regarding your data quality:

  • Inconsistencies

Inconsistencies are values that don’t correspond. A product name may contain a volume of 100ML, while the data contains a value of 0.25L, which occurs to be the wrong value.

  • Anomalies

We speak of an anomaly when all products in a certain product class have the same characteristics, except for one. For example, data on every jar of peanut butter mentions information about a nut allergy, except for one. It can be assumed that this particular jar also contains nuts, only the data may not provide you with this information.

  • Doublures

Doublures are identical data values that are in the same data set. These duplicates decrease your data quality, so make sure your data set does not contain any doublures.

  • Unlikely data values

The concept of unlikely data values speaks for itself. These are values that are highly unlikely for certain features. An example would be a smartphone with a screen size of 60 inch or a faucet with a height of 15 meters.

Fixing these issues

These issues need to be prevented and when your data suffers from these issues, it’s relevant to solve them. One possibility is to manually improve your data and check all of a product’s features. For organizations that sell many products, like wholesalers and some retailers, this is an extremely time-consuming and vastly expensive process. Another possibility is to automate the process of optimizing data quality and quantity. There are certain softwares that use Machine Learning and AI algorithms to scan a product dataset and find inconsistencies, anomalies, doublures, and unlikely data values. By this, you prevent liabilities due to incorrect product data, you can save many manhours for manual searching, judging, and correcting your product data, and your product data fill-rates will increase. In this way, your overall data quality is improved, which can contribute to an increased revenue.

Automating data quality improvement is a smart tool that makes the process of automated data quality improvement possible. No technical knowledge is required: you only need to input your Excel file with product information into the tool. You can also automate the process of importing or exporting your data by using a direct API link, which can be linked to your PIM system. After this, the algorithm does the hard work for you and detects doublures, inconsistencies, anomalies, and unlikely values within the instant! Then the matched features and values should be manually validated in order to make sure that all data is now of sufficient quality. This obviously is way less time-consuming than manually checking all features of all products.

WEBINAR: Automatiseer het schrijven van productbeschrijvingen

Werk slimmer met behulp van intelligente algoritmes

Klik voor meer informatie