Try our conversational search powered by Generative AI!

UnifiedSearch excerpt length not constrained as expected.

swc
swc
Vote:
 

We are indexing PDFs with Find. When we show the results, we display the title and the highlighted excerpt. The issue I am seeing is that the excerpt length doesn't seem to be respecting any of the limits we've passed in, so the excerpt is far too long.

In the search result template, we are rendering it using:

result.Document.Excerpt

In our Find Initialization, we are setting up the highlighted excerpt projection:

SearchClient.Instance.Conventions.UnifiedSearchRegistry.ForInstanceOf()
    .ProjectHighlightedExcerptUsing(spec => doc => !string.IsNullOrEmpty(doc.SearchAttachmentText().AsHighlighted()) ?
doc.SearchAttachmentText().AsHighlighted(new HighlightSpec() { FragmentSize = spec.ExcerptLength, NumberOfFragments = 1 })
: doc.SearchAttachmentText().AsCropped(spec.ExcerptLength));

When searching, we pass in a hit spec with highlighting enabled on the excerpt:

var hitSpec = new HitSpecification
			{
				HighlightExcerpt = true,
				EncodeTitle = false,
				EncodeExcerpt = false,
				ExcerptHighlightSpecAction = spec => new HighlightSpec { FragmentSize = 100, NumberOfFragments = 1, }
			};

What we see is that the excerpt comes back populated and highlighted, but the length appears to be the entire length of the AttachmentText rather than a truncated version of it per the hit spec or the highlight spec. 

I've adjusted the ProjectHighlightedExcerptUsing lamda, and the results change their excerpt correspondingly, so it is running / being called for the results I am looking at, but the length of the field just doesn't seem to be constrained.

Am I missing something?

Thanks!

#193577
May 31, 2018 16:38
Vote:
 

Could you be suffering from this bug? 

https://world.episerver.com/support/Bug-list/bug/FIND-2611

#193637
Jun 01, 2018 13:59
swc
Vote:
 

I don't believe this is caused by that bug. If I'm reading that report correctly, I'd be seeing sizes of approx. 500 chars (2 x 250 frags), but the excerpts I get back are much longer. 

I have arrived at a work around, but I'm not clear on why the behavior works as it does. 

In the search conventions, I was checking whether the SearchAttachmentText.AsHighlighted() was null or empty, and if not, was projecting the excerpt from that field. What I've found is that if I change the IsNullOrEmpty() call to also include the HighlightSpec, then the resulting projection is the right length.  

eg, if I change this:

SearchClient.Instance.Conventions.UnifiedSearchRegistry.ForInstanceOf<MediaData>()
    .ProjectHighlightedExcerptUsing<MediaData>(spec => doc => !string.IsNullOrEmpty(doc.SearchAttachmentText().AsHighlighted()) ?
doc.SearchAttachmentText().AsHighlighted(new HighlightSpec() { FragmentSize = spec.ExcerptLength, NumberOfFragments = 1 })
: doc.SearchAttachmentText().AsCropped(spec.ExcerptLength));

to this:

SearchClient.Instance.Conventions.UnifiedSearchRegistry.ForInstanceOf<MediaData>()
    .ProjectHighlightedExcerptUsing<MediaData>(spec => doc => !string.IsNullOrEmpty(doc.SearchAttachmentText().AsHighlighted(new HighlightSpec() { FragmentSize = spec.ExcerptLength, NumberOfFragments = 1 })) ?
doc.SearchAttachmentText().AsHighlighted(new HighlightSpec() { FragmentSize = spec.ExcerptLength, NumberOfFragments = 1 })
: doc.SearchAttachmentText().AsCropped(spec.ExcerptLength));

Then the projected excerpt is sized differently, even though the change is to the IsNullOrEmpty() check, rather than the result. If someone can explain that, I'd be curious to understand why it behaves that way, but I'm getting back what I need now.

#193639
Jun 01, 2018 14:12
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.