Hi,
Yes, you can change the StripHtml-convention by replacing the default for strings and XHtmlStrings:
SearchClient.Conventions.ForInstancesOf<PageData>().FieldsOfType<string>()
SearchClient.Conventions.ForInstancesOf<XhtmlString>().Field(x => x.AsViewedByAnonymous()).ConvertBeforeSerializing(x => x.MyHtmlStripMethod())
Regards,
Henrik
Was a bit to quick on the trigger for the solved link.
I use the following code, the string that is passed in to "ReplaceHtmlTagsWithSpace()" has already been stripped from all html tags...
SearchClient.Instance.Conventions.ForInstancesOf<PageData>() .FieldsOfType<string>() .ConvertBeforeSerializing(x => x.ReplaceHtmlTagsWithSpace());
I'll look into that. For now you should be able to avoid this by loading your conventions before the CMS-does. If you do this in Global.asax ApplicationStart your convention should be executed before the CMS.
No difference when I setup the convention in Application_Start, the html tags have been stripped before it hits my breakpoint inside ReplaceHtmlTagsWithSpace(). Using EPiServer.Find 1.0.0.314.
Ok, then we need to use the final trick and reset the convention for the properties where you have this issue by Excluding/Including:
client.Conventions.ForInstancesOf<PageData>().ExcludeField(x => x.PageName);
client.Conventions.ForInstancesOf<PageData>().IncludeField(x => x.PageName);
client.Conventions.ForInstancesOf<PageData>()
.Field(x => x.PageName)
.ConvertBeforeSerializing(x => x.MyExtension());
A little messy but it will solve your isse without affecting any querying code (as it would have if we had extended all properties with an pre-stripped-html version).
Yep, that solved it! It would be nice though if it were possible to do it using the other solution you posted... :)
A possibly better way of doing this is to "overwrite" the PageName used by the JSON serializer, aided by the JsonPropertyAttribute like so
[JsonProperty(PropertyName = "PageName")] public string PageNameForIndexing { get { return PageName.ReplaceHtmlTagsWithSpace(); } }
This solution also makes possible the combination of other page properties as value for the PageName property, as was my requirement.
Seems like the indexer is replacing tags with an empty string, is it possible to configure it to replace it with a space instead?
<p><em>CEO</em><br><a href="mailto:kalle.banan@company.com">Kalle Banan</a></p>
This gets indexed as "CEOKalle Banan" and I would like it to be "CEO Kalle Banan"