Arthur Vander Voort
Feb 18, 2023
(0 votes)

Optimizely PIM - Data Cleansing in PIM: Save Time by Importing Messy Data

In my previous post, I talked pretty extensively about why setting up data governance is both an obstacle and essential to getting data into the PIM. Another common obstacle is trying to perfect your data prior to importing it to the PIM. You can spend a lot of time in an Excel spreadsheet, trying to find and replace a bunch of bad values across dozens or hundreds of columns. While this is a valid approach, Optimizley PIM also has functionality that you can leverage to cleanse your data directly in the application. This article outlines an approach to speed up your implementations without sacrificing data quality.


Before we get into it, here are a few caveats:

  • This approach is specifically for what we commonly call 'attribute' data – product data that has a defined list of values. Think of size, color, guage, width, length, material, finish, wattage, and so on.
  • Depending on the number of properties and property values, this can still be a time consuming process. 
  • This approach will not help you clean up data for unique product properties like product description or product title.
  • This approach is only recommended during implementation. You must re-establish strong data governance after cleansing your data to get value from this exercise.

Set up your properties to allow bad data

Properties that ony allow a defined list of values typically make up the bulk of product data. Normally, when data is imported, if a product has a value that is not on the list for this property, the data will not be imported, and the bad data will be isolated. The key to this approach is to intentionally set your properties in a way that allows bad data to be ingested. You must set all of your properties that use a defined list of values to allow ad hoc values. With ad hoc values enabled, no data governance is enforced, and any value will be imported and added to the property's value list.

If you already have all of your properties created, you can easily toggle them to allow ad hoc values:

  1. Filter the property list for properties that use a dropdown control type.
  2. Export properties and choose the option to only included the filtered list.
  3. In the exported file, set 'Allow Adhoc Values' to 'yes' for every row.
  4. Import the properties to the PIM.

Import your messy product data

With the properties configured, you can now import your product data. Data governance will not prevent your messy data from being imported for these properties, which is exactly what we want.

  1. Go to imports.
  2. Select your product data file.
  3. Map your data and proceed through the import wizard.
  4. Initiate the product import.

Review & cleanse your data

With the import complete, all of the data, good and bad, has been imported and the property value lists have been updated. This is where the work starts. We are going to review the values for each property and correct any bad ones. To do this, we will filter for the properties we configured to allow ad hoc values and then work our way through the list.

  1. On the property list, apply a filter for properties that use a dropdown control type.
  2. Edit the first property.
  3. Go to the values tab.
  4. Review the list of values for bad data.
  5. For any bad value, click the edit icon and enter the preferred value (for instance, if you want '33 inches to' be '33 in', update this).
  6. When you have cleaned up all the bad values, toggle 'allow ad hoc values' to false and save the property.
  7. Repeat until you are through all properties.

When the data was initially imported, any bad values were added to the property list and saved to products that had those values. When you edit the property value and save the property, we will propagate the changes to all products that had the value. This lets you remove duplicate (e.g. 24" & 24 inches) and erroneous values while automatically updating the product data.

Re-establish your data governance to maintain quality

Now that all of our property value lists have been reviewed and the product data is cleansed,  we need to ensure that we don't waste this effort by allowing bad data back into the PIM. In the previous step, we recommended toggling the ad hoc value option off after you finished each property, but make sure you did not miss any. To do this, we can use the export/import approach again similar to when we initially enabled the ad hoc values.

  1. Filter the property list for properties that use a dropdown control type.
  2. Export properties and choose the option to only included the filtered list.
  3. In the exported file, set 'enable ad hoc' to 'no' for every row.
  4. Import the properties to PIM.
Feb 18, 2023


Please login to comment.
Latest blogs
Zombie Properties want to Eat Your Brains

It’s a story as old as time. You work hard to build a great site. You have all the right properties – with descriptive names – that the content...

Joe Mayberry | Mar 29, 2023 | Syndicated blog

Optimizely finally releases new and improved list properties!

For years, the Generic PropertyList has been widely used, despite it being unsupported. Today a better option is released!

Tomas Hensrud Gulla | Mar 28, 2023 | Syndicated blog

Official List property support

Introduction Until now users were able to store list properties in three ways: Store simple types (int, string, DateTime, double) as native...

Bartosz Sekula | Mar 28, 2023

New dashboard implemented in CMS UI 12.18.0

As part of the CMS UI 12.18.0 release , a new dashboard has been added as a ‘one stop shop’ to enable editors to access all of their content items,...

Matthew Slim | Mar 28, 2023