I have tried following the Removing HTML tags examples, but as far as i can see, it is not compatible with XhtmlString properties.
Nothing changes when i add the attribute to my property.
The conventions is simply not supported in the API for XhtmlString.
Maybee i did something wrong?
I deleted the index and published the page again, but with little luck.
Are you sure this attribute is supposed to work for XhtmlString's as well?
If you look at Find's explore view, does the Body property contain html? What about the "SearchText" field in the index?
The SearchText gets indexes including HTML tags, but the HTML gets stripped from the AsViewedByAnonymous property of the body property.
"SearchText$$string": "Om os <p>Dette er en test af hvordan </p>\n<ul>\n<li>EPiServer</li>\n<li>har tænkt sig at <strong>håndtere</strong> HTML</li>\n<li>i indholdsfelter</li>\n</ul>\n<p>Går det mon godt i <strong>dette</strong> tilfælde?</p> epi.cms.contentdata:///105",
"Body": {
"IsEmpty$$bool": false,
"___types": [
"EPiServer.Core.XhtmlString",
"System.Object",
"System.Web.IHtmlString",
"System.Runtime.Serialization.ISerializable",
"EPiServer.Data.Entity.IReadOnly`1[[EPiServer.Core.XhtmlString, EPiServer, Version=8.8.1.0, Culture=neutral, PublicKeyToken=8fe83dea738b45b7]]",
"EPiServer.Data.Entity.IReadOnly"
],
"AsViewedByAnonymous$$string": "Dette er en test af hvordan EPiServer har tænkt sig at håndtere HTML i indholdsfelter Går det mon godt i dette tilfælde?",
"IsModified$$bool": false,
"$type": "EPiServer.Core.XhtmlString, EPiServer"
},
Sounds like a bug in the SearchText extension to me. This does not happen in EPiServer Find 11.
A workaround could be to add the following to an initializable module:
SearchClient.Instance.Conventions.ForInstancesOf<IContent>().Field(x => x.SearchText()).StripHtml();
and then reindex. I have not tried this though.
Using your code gives me some strange results.
When i execute a query i get TotalMatchingResult = 1 but the hits collection is empty.
But the code below seems to fix it.
public static string StripHtml(string html) { if (!string.IsNullOrEmpty(html)) { return html.StripHtml(); } return html; } public void Initialize(InitializationEngine context) { SearchClient.Instance.Conventions.ForInstancesOf<IContent>().Field(x => x.SearchText()).ConvertBeforeSerializing(StripHtml); }
Now the only problem is that the SearchText includes some weird suffix, which is searchable and is sometimes included in the Excerpt.
"SearchText$$string": "Om os Dette er en test af hvordan \n \n EPiServer \n har tænkt sig at håndtere HTML \n i indholdsfelter \n \n Går det mon godt i dette tilfælde? \n test test test test epi.cms.contentdata:///105",
Edit: Now using EPiServer.Find.Helpers.Text.StringExtensions.StripHtml() instead of custom Regex.
I think you should try a different approach as this is getting slightly hacky :-)
How about implementing your own SearchText property on your page? This will override the searchtext field in the index. You will then have full control on what is in the SearchText field.
Do this by adding a new property named SearchText on your content type or base class:
for example:
public virtual string SearchText => string.Format(CultureInfo.InvariantCulture, "{0} {1} {2}", PageName, MainBody != null ? MainBody.AsViewedByAnonymous() : "", MetaTitle);
if you are not using c# 6.0:
public virtual string SearchText { get { return String.Format(CultureInfo.InvariantCulture, "{0} {1} {2}", PageName, MainBody != null ? MainBody.AsViewedByAnonymous() : "", MetaTitle); } }
I have a XhtmlString property on a page which is indexed using Find.
[CultureSpecific]
[Display(
Name = "Body",
GroupName = Core.EPiServer.Ui.PageTabNames.Content,
Order = 20)]
public virtual XhtmlString Body { get; set; }
But when i search using SearchClient.Instance.UnifiedSearchFor(query) then the Excerpt property includes HTML from the properties of type XhtmlString.
I would like Find to strip the HTML before adding the content to the index.
What is the best way to implement this?