Try our conversational search powered by Generative AI!

Arthur Vander Voort
Feb 18, 2023
  773
(0 votes)

Optimizely PIM - Data Cleansing in PIM: Save Time by Importing Messy Data

In my previous post, I talked pretty extensively about why setting up data governance is both an obstacle and essential to getting data into the PIM. Another common obstacle is trying to perfect your data prior to importing it to the PIM. You can spend a lot of time in an Excel spreadsheet, trying to find and replace a bunch of bad values across dozens or hundreds of columns. While this is a valid approach, Optimizley PIM also has functionality that you can leverage to cleanse your data directly in the application. This article outlines an approach to speed up your implementations without sacrificing data quality.

MessyDataSample.png

Before we get into it, here are a few caveats:

  • This approach is specifically for what we commonly call 'attribute' data – product data that has a defined list of values. Think of size, color, guage, width, length, material, finish, wattage, and so on.
  • Depending on the number of properties and property values, this can still be a time consuming process. 
  • This approach will not help you clean up data for unique product properties like product description or product title.
  • This approach is only recommended during implementation. You must re-establish strong data governance after cleansing your data to get value from this exercise.

Set up your properties to allow bad data

Properties that ony allow a defined list of values typically make up the bulk of product data. Normally, when data is imported, if a product has a value that is not on the list for this property, the data will not be imported, and the bad data will be isolated. The key to this approach is to intentionally set your properties in a way that allows bad data to be ingested. You must set all of your properties that use a defined list of values to allow ad hoc values. With ad hoc values enabled, no data governance is enforced, and any value will be imported and added to the property's value list.

If you already have all of your properties created, you can easily toggle them to allow ad hoc values:

  1. Filter the property list for properties that use a dropdown control type.
  2. Export properties and choose the option to only included the filtered list.
  3. In the exported file, set 'Allow Adhoc Values' to 'yes' for every row.
  4. Import the properties to the PIM.
SetAdhocs.gif

Import your messy product data

With the properties configured, you can now import your product data. Data governance will not prevent your messy data from being imported for these properties, which is exactly what we want.

  1. Go to imports.
  2. Select your product data file.
  3. Map your data and proceed through the import wizard.
  4. Initiate the product import.

Review & cleanse your data

With the import complete, all of the data, good and bad, has been imported and the property value lists have been updated. This is where the work starts. We are going to review the values for each property and correct any bad ones. To do this, we will filter for the properties we configured to allow ad hoc values and then work our way through the list.

  1. On the property list, apply a filter for properties that use a dropdown control type.
  2. Edit the first property.
  3. Go to the values tab.
  4. Review the list of values for bad data.
  5. For any bad value, click the edit icon and enter the preferred value (for instance, if you want '33 inches to' be '33 in', update this).
  6. When you have cleaned up all the bad values, toggle 'allow ad hoc values' to false and save the property.
  7. Repeat until you are through all properties.
PropertyValueCleanup.gif

When the data was initially imported, any bad values were added to the property list and saved to products that had those values. When you edit the property value and save the property, we will propagate the changes to all products that had the value. This lets you remove duplicate (e.g. 24" & 24 inches) and erroneous values while automatically updating the product data.

Re-establish your data governance to maintain quality

Now that all of our property value lists have been reviewed and the product data is cleansed,  we need to ensure that we don't waste this effort by allowing bad data back into the PIM. In the previous step, we recommended toggling the ad hoc value option off after you finished each property, but make sure you did not miss any. To do this, we can use the export/import approach again similar to when we initially enabled the ad hoc values.

  1. Filter the property list for properties that use a dropdown control type.
  2. Export properties and choose the option to only included the filtered list.
  3. In the exported file, set 'enable ad hoc' to 'no' for every row.
  4. Import the properties to PIM.
Feb 18, 2023

Comments

Please login to comment.
Latest blogs
Solving the mystery of high memory usage

Sometimes, my work is easy, the problem could be resolved with one look (when I’m lucky enough to look at where it needs to be looked, just like th...

Quan Mai | Apr 22, 2024 | Syndicated blog

Search & Navigation reporting improvements

From version 16.1.0 there are some updates on the statistics pages: Add pagination to search phrase list Allows choosing a custom date range to get...

Phong | Apr 22, 2024

Optimizely and the never-ending story of the missing globe!

I've worked with Optimizely CMS for 14 years, and there are two things I'm obsessed with: Link validation and the globe that keeps disappearing on...

Tomas Hensrud Gulla | Apr 18, 2024 | Syndicated blog

Visitor Groups Usage Report For Optimizely CMS 12

This add-on offers detailed information on how visitor groups are used and how effective they are within Optimizely CMS. Editors can monitor and...

Adnan Zameer | Apr 18, 2024 | Syndicated blog