Calling all developers! We invite you to provide your input on Feature Experimentation by completing this brief survey.

 

Viktor Sahlström
Jan 13, 2016
  6935
(3 votes)

Improving attachment search relevancy using the attachment helper

The way that Episerver Find treats attachments is not optimal in all scenarios. If you are a large company with lots of documents in many languages, you might notice that the search hits are not always optimal when trying to find a document. This is due to how Find handles indexing of attachments.

When indexing an attachment, the file is sent to Find as a base64 encoded string. The string is parsed in Apache Tika, and the resulting text is indexed using the standard language analyzer. This approach creates several issues.

  • A lot of data passed from the client to Find is not really needed, like images in a pdf. This causes an unnecessary flow of data over the network.
  • The parsed text is only indexed using the standard analyzer. This significantly reduces the quality of hits if the attachment content is not written in the standard language.
  • While browsing indexed content using the explorer view in the Find edit mode. For an attachment, the actual text is not put in the document but rather the base64 representation of the document. This might make it hard to browse the index and to verify that the correct optimization is done.

 

To solve these issues in one go, the Attachment Helper interface was created. This interface lets the developer decide how to handle attachments. Out of the box, there is an implementation created by Episerver using the Windows built-in IFilter features here. This version supports a wide range of file types and is easy to get going. 

Install the nuget package and the IFilters that suit your needs and, suddenly, your attachment search experience is vastly improved. You might notice that network traffic is reduced when running the index job, your searches provide more relevant hits, and the Find administrator can view the document content from inside the Find admin UI.

For more details on the search attachment filter, check out the docs.

Jan 13, 2016

Comments

Please login to comment.
Latest blogs
Find and delete non used media and blocks

On my new quest to play around with Blazor and MudBlazor I'm going back memory lane and porting some previously plugins. So this time up is my plug...

Per Nergård (MVP) | Jan 21, 2025

Optimizely Content Graph on mobile application

CG everywhere! I pull schema from our default index https://cg.optimizely.com/app/graphiql?auth=eBrGunULiC5TziTCtiOLEmov2LijBf30obh0KmhcBlyTktGZ in...

Cuong Nguyen Dinh | Jan 20, 2025

Image Analyzer with AI Assistant for Optimizely

The Smart Image Analyzer is a new feature in the Epicweb AI Assistant for Optimizely CMS that automates the management of image metadata, such as...

Luc Gosso (MVP) | Jan 16, 2025 | Syndicated blog

How to: create Decimal metafield with custom precision

If you are using catalog system, the way of creating metafields are easy – in fact, you can forget about “metafields”, all you should be using is t...

Quan Mai | Jan 16, 2025 | Syndicated blog

Level Up with Optimizely's Newly Relaunched Certifications!

We're thrilled to announce the relaunch of our Optimizely Certifications—designed to help partners, customers, and developers redefine what it mean...

Satata Satez | Jan 14, 2025

Introducing AI Assistance for DBLocalizationProvider

The LocalizationProvider for Optimizely has long been a powerful tool for enhancing the localization capabilities of Optimizely CMS. Designed to ma...

Luc Gosso (MVP) | Jan 14, 2025 | Syndicated blog