Prevent indexing of PII data
Describes how to filter out personally identifiable information (PII) data to prevent indexing of this in Optimizely Search & Navigation.
This is an important part when managing GDPR compliance.
IGDPRConventions
and ITrackSanitizerPatternRepository
are used for adding the filtering.
Conventions
IGDPRConventions
has these methods.
Description | Sample |
---|---|
Set patterns to remove GDPR data from a search query. | public virtual void SetGDPRPatterns(List gdprPatterns) |
Get the GDPR patterns to be removed in a search query | public virtual IEnumerable Get GDPRPatterns() |
Delete the GDPR data in the search query that matches the patterns. | public string RemoveGDPRDataInQuery(string queryStringQuery) |
ITrackSanitizerPatternRepository
The ITrackSanitizerPatternRepository
has these methods.
Method description | Sample |
---|---|
Add patterns to remove PII data from search query | |
Add single pattern | public string Add(TrackSanitizerPattern pattern) |
Add multiple patterns | public void Add(IEnumerable patterns) |
Update patterns to remove PII data from search query | |
Update single pattern | public string Update(TrackSanitizerPattern pattern) |
Update multiple patterns | public bool Update(IEnumerable patterns) |
Get patterns to remove PII in search query | |
Get all patterns | public IEnumerable GetAll() |
Get a pattern by Id | public TrackSanitizerPattern Get(string patternId) |
Delete PII data in the search query that matched the patterns | |
Delete pattern by Id | public void Delete(string patternId) |
Delete all patterns | public void DeleteAll() |
Example
The patterns support plain text, wildcard, and regex. Here are some example filters.
- Full name – “John Smith”, “Steven” …
- Keyword contains email – “*@gmail.com”, “*@yahoo.com” …
- Regex string – “\w+([-+.]\w+)*@\w+([-.]\w+)*.\w+([-.]\w+)*” …
public class Sample {
protected IClient _client;
protected IStatisticsClient _statisticsClient;
protected ITrackSanitizerPatternRepository _trackSaniziterRepository;
public Sample(IClient client) {
_client = client;
_statisticsClient = client.Statistics();
_trackSaniziterRepository = client.TrackSanitizer().TrackSaniziterRepository;
}
public void Test() {
// Setting and add sanitizer patterns.
_trackSaniziterRepository.Add(new List < TrackSanitizerPattern > {
new TrackSanitizerPattern {
PatternString = "admin",
PatternType = TrackSanitizerFilterType.PlainText
},
new TrackSanitizerPattern {
PatternString = "email",
PatternType = TrackSanitizerFilterType.PlainText
},
new TrackSanitizerPattern {
PatternString = "*@mail.com",
PatternType = TrackSanitizerFilterType.Wildcard
},
new TrackSanitizerPattern {
PatternString = "1#1",
PatternType = TrackSanitizerFilterType.Wildcard
},
new TrackSanitizerPattern {
PatternString = "c[a-e]ll",
PatternType = TrackSanitizerFilterType.Wildcard
},
new TrackSanitizerPattern {
PatternString = @ "\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
PatternType = TrackSanitizerFilterType.Regex
}
});
// Doing Tracking behavior
var result = _client
.UnifiedSearchFor(@ "[email protected]")
.StatisticsTrack()
.GetResult();
// Try to get GDPR data by exact term matched sanitize pattern.
var response = _statisticsClient.GetGDPR("[email protected]", x => {});
}
};
The _statisticsClient.GetGDPR()
API only support exact term search due to limitations of statistics indexes.
Install and verify
The steps below describe how to implement and verify the PII filtering.
- CMS Alloy sample site (for CMS 11 and Commerce 13) installed from the Visual Studio Extension. See also Installing Optimizely .NET5 for CMS 12 and Commerce 14.
- Optimizely Search & Navigation service URL and default index name, for example
http://es-api-test01.episerver.com/>\<PRIVATE_KEY>
. - Optimizely Search & Navigation client-side resource base URL, for example
https://dl.episerver.net/13.2.0
. - Optimizely Search & Navigation 13.2.0
Install packages
-
In Visual Studio, set the default project to
Templates.Alloy
. -
Install the following NuGet packages (use the “-pre” option to get the latest development package).
Find.Cms
Find.Statistics
-
Open the Alloy
web.config
file and update the following entries:- In the
<episerver.find>
tagserviceUrl
defaultIndex
- In the
<episerver.find.ui>
tagclientSideResourceBaseUrl
- In the
<appSettings>
tag- add an item with key
episerver:Find.TrackingSanitizerEnabled
and value true
- add an item with key
- In the
-
Access Admin Mode and add a GDPR test page.
a. Go to CMS > Admin > Content Type tab > Page Types > [Specialized] Start Page > Settings.
b. Click Available Page Types and check [Specialized] Find GDPR API Demo Page and click Save.
- Go to the CMS Edit > navigation panel > Pages tab > Start branch of the tree structure.
- Create a GDPR Search page and publish it.
- Return to CMS > Admin view.
- Under Scheduled jobs, click Optimizely Find Content Indexing Job and start that job manually.
Verify
In these steps, you perform a search, delete the GDPR-related data, and add a filtering pattern to prevent it from being indexed.
- Open the GDPR Demo page created in the previous steps. Clear the GDPR pattern settings to verify that the tracking function runs well.
- Go to the search page and execute a search with some keywords.
- Go to the GDPR Demo page and review the displayed data.
- Delete the existing GDPR data and set patterns to prevent it.
- Search again and recheck for the GDPR data. This should now be filtered out.
Updated 9 days ago