Try our conversational search powered by Generative AI!

Replace tags with a space

Vote:
 

Seems like the indexer is replacing tags with an empty string, is it possible to configure it to replace it with a space instead?

<p><em>CEO</em><br><a href="mailto:kalle.banan@company.com">Kalle Banan</a></p>

This gets indexed as "CEOKalle Banan" and I would like it to be "CEO Kalle Banan"

#71335
May 16, 2013 8:37
Vote:
 

Hi,

Yes, you can change the StripHtml-convention by replacing the default for strings and XHtmlStrings:

SearchClient.Conventions.ForInstancesOf<PageData>().FieldsOfType<string>()
SearchClient.Conventions.ForInstancesOf<XhtmlString>().Field(x => x.AsViewedByAnonymous()).ConvertBeforeSerializing(x => x.MyHtmlStripMethod())

Regards,

Henrik

#71341
May 16, 2013 9:09
Vote:
 

Was a bit to quick on the trigger for the solved link.

I use the following code, the string that is passed in to "ReplaceHtmlTagsWithSpace()" has already been stripped from all html tags...

SearchClient.Instance.Conventions.ForInstancesOf<PageData>()
				.FieldsOfType<string>()
				.ConvertBeforeSerializing(x => x.ReplaceHtmlTagsWithSpace());
#71350
Edited, May 16, 2013 10:36
Vote:
 

I'll look into that. For now you should be able to avoid this by loading your conventions before the CMS-does. If you do this in Global.asax ApplicationStart your convention should be executed before the CMS.

#71353
May 16, 2013 11:32
Vote:
 

No difference when I setup the convention in Application_Start, the html tags have been stripped before it hits my breakpoint inside ReplaceHtmlTagsWithSpace(). Using EPiServer.Find 1.0.0.314.

#71363
May 16, 2013 13:20
Vote:
 

Ok, then we need to use the final trick and reset the convention for the properties where you have this issue by Excluding/Including:

client.Conventions.ForInstancesOf<PageData>().ExcludeField(x => x.PageName);
client.Conventions.ForInstancesOf<PageData>().IncludeField(x => x.PageName);
client.Conventions.ForInstancesOf<PageData>()
       .Field(x => x.PageName)
       .ConvertBeforeSerializing(x => x.MyExtension());

A little messy but it will solve your isse without affecting any querying code (as it would have if we had extended all properties with an pre-stripped-html version).

#71370
May 16, 2013 14:50
Vote:
 

Yep, that solved it! It would be nice though if it were possible to do it using the other solution you posted... :)

#71373
May 16, 2013 15:11
Vote:
 

A possibly better way of doing this is to "overwrite" the PageName used by the JSON serializer, aided by the JsonPropertyAttribute like so

[JsonProperty(PropertyName = "PageName")]
public string PageNameForIndexing
{
     get { return PageName.ReplaceHtmlTagsWithSpace(); }
}

This solution also makes possible the combination of other page properties as value for the PageName property, as was my requirement.

#113262
Edited, Nov 17, 2014 12:55
This topic was created over six months ago and has been resolved. If you have a similar question, please create a new topic and refer to this one.
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.