A critical vulnerability was discovered in React Server Components (Next.js). Our systems remain protected but we advise to update packages to newest version. Learn More

Deane Barker
Dec 7, 2010
  10361
(4 votes)

Integrated Lucene Search for EPiServer

We recently implemented a site that had more PDF files than pages, in a ratio of about 6-to-1.  Sadly, this process revealed some of the shortcomings in EPiServer’s default search architecture.

The biggest problem we found is that there are two search algorithms at work.  Pages are indexed via a custom-built tokenizer, with their keywords stored in a database table.  Files, on the other hand, are indexed and searched via Lucene.Net.

Given that there are two separate methodologies, it becomes somewhat impossible to mix results.  When using the SearchDataSource, what happens is that all page results are returned first, then all file results come after.  This is a significant issue for my client.  Given their mix of content formats, visitors are often looking for a PDF file first, and an HTML page second.

There is a “magic” property on all search result pages called “PageRank.”  We were excited when we found this, because we figured we could use LINQ to order by this property.  Unfortunately, this property is fairly crippled because all files have the value of “500.”  We reflected through the DLL and found that this is hard-coded and not changeable.

When we brought this to their attention, EPiServer offered to do a hotfix, but we weren’t sure it would help – even if this value wasn’t always the same, how do we know if the PageRank value for files has a directly proportional relationship to the PageRank value of the page results?  These two values are calculated completely differently, and we have no way of knowing if they compare to each other in any meaningful way.  (ex: if a page result has a PageRank of 500 and a file result has a PageRank of 1000, does it follow that the file is twice as relevant than the page?)

So, the bottom line is that we needed to get pages and files indexed and searched using the same algorithm.  This way, we can compare them are know that the relative values have some relationship.

Files are indexed via the EPiServer Indexing Service, and this offered the fewest options to change.  So, we decided to keep that as-is, and simply created another Lucene.Net index for pages.  Then, we abandoned EPiServer’s SearchDataSource control, and implemented a new searching API that uses raw Lucene.Net calls to query both sets of indexes – those from files (created by the EPiServer Indexing Service) and pages (created by our new plugin).

Code to search looks like this:

LuceneSearchManager search = new LuceneSearchManager();
search.SearchIndexes = "SitePages,SiteGlobalFiles,SiteDocuments";

LuceneSearchResultCollection resultCollection = search.Search("episerver");

SearchResults.DataSource = resultCollection.Results;
SearchResults.DataBind();

In this situation, “SitePages” is an extra VPP created just to store the Lucene index of pages.

Preliminary tests have been quite good.  Both files and pages are indexed and searched using the same methodology, and result sets are ordered by rank, regardless of whether they’re page or file results.  The API supports paging, and also supports EPiServer’s security model (the actual enforcement of security was modeled after how EPiServer does it in SearchDataSource – it catches an expected exception at one point, which I’m not in love with, but it works).  PageTypeNames and all ancestor IDs back to the Start Page are stored, and can be used for searching and filtering as well.

Additionally, since you have low-level access to the underlying Lucene.Net index, you have the full power of Lucene.Net at your disposal, including search-time field boost to bias results.

This has not been deployed to production yet, but will likely see the light of day in Q1 2011.  Consider it an alpha release.  If you implement, we welcome all feedback.

Lucene Search Download

Dec 07, 2010

Comments

Dec 7, 2010 11:17 AM

Nice solution!

We are aware that the mixed-search model is a pain-point and we will adress it. This will happen after the release of EPiServer CMS 6 R2.

Jeff Wallace
Jeff Wallace Dec 8, 2010 08:03 PM

Nice work. :)

Aug 11, 2011 04:00 PM

Hi Deane!

I'm currently testing Lucene via the EPiServer FTS module and the facade code from EPiServer Nuget (ref. http://world.episerver.com/Blogs/Paul-Smith/Dates1/2011/5/EPiServer-Full-Text-Search-Now-Available-For-CMS-6-R2/).
For comparison I'd very much like to check out your solution but the download link is no longer working. Can you update the link?

Please login to comment.
Latest blogs
Looking back at Optimizely in 2025

Explore Optimizely's architectural shift in 2025, which removed coordination cost through a unified execution loop. Learn how agentic Opal AI and...

Andy Blyth | Dec 17, 2025 |

Cleaning Up Content Graph Webhooks in PaaS CMS: Scheduled Job

The Problem Bit of a niche issue, but we are building a headless solution where the presentation layer is hosted on Netlify, when in a regular...

Minesh Shah (Netcel) | Dec 17, 2025

A day in the life of an Optimizely OMVP - OptiGraphExtensions v2.0: Enhanced Search Control with Language Support and Synonym Slots

Supercharge your Optimizely Graph search experience with powerful new features for multilingual sites and fine-grained search tuning. As search...

Graham Carr | Dec 16, 2025

A day in the life of an Optimizely OMVP - Optimizely Opal: Specialized Agents, Workflows, and Tools Explained

The AI landscape in digital experience platforms has shifted dramatically. At Opticon 2025, Optimizely unveiled the next evolution of Optimizely Op...

Graham Carr | Dec 16, 2025

Optimizely CMS - Learning by Doing: EP09 - Create Hero, Breadcrumb's and Integrate SEO : Demo

  Episode 9  is Live!! The latest installment of my  Learning by Doing: Build Series  on  Optimizely Episode 9 CMS 12  is now available on YouTube!...

Ratish | Dec 15, 2025 |

Building simple Opal tools for product search and content creation

Optimizely Opal tools make it easy for AI agents to call your APIs – in this post we’ll build a small ASP.NET host that exposes two of them: one fo...

Pär Wissmark | Dec 13, 2025 |