I have an EPiServer 7.5 site with Find. I'm indexing lots of files and I would like the UnifiedSearchHit.Excerpt to always be taken from the file contents. This is the case when there no custom properties on the MediaData are filled in, but as soon as some field is entered that becomes the SearchText property in the index. I know about the SearchSummary field to control the excertp and about conventions etc, but how do I make it get the excerpt from the file contents and not from the properties?
You can use the Searchable attribute on the other properties and set it to false. This way, the values on them will not be included in the SearchText field
But then they won't be searchable either, right? We still want them to be indexed, just not as the excerpt.
Some dude named Henrik at EPiServer was kind enough to help me out with this. Cheers, Henrik!
So this is how you tell Find to always use the file contents as excerpt:
SearchClient.Instance.Conventions.UnifiedSearchRegistry.ForInstanceOf<DocumentMediaData>() .ProjectExcerptUsing<ISearchContent>(spec => doc => doc.SearchAttachment.AsCropped(spec.ExcerptLength) );
Ah, great! You might also want to set .ProjectHighlightedExcerptUsing as well. If using highlighting, this will make sure that file content is returned, and if there are not words to highlight in the file content, the cropped content is returned.
x => !string.IsNullOrEmpty(x.SearchAttachment().AsHighlighted()) ? x.SearchAttachment().AsHighlighted() : x.SearchAttachment().AsCropped(spec.ExcerptLength));
As a side note, the Searchable attribute will remove the property from the SearchText field. However, the original property will still be indexed. To remove a property completely, the JsonIgnore attribute should be used.