November Happy Hour will be moved to Thursday December 5th.

Drew Null
Apr 15, 2018
  10227
(12 votes)

Episerver Find Wildcard Queries and Best Bets

I have used the approach detailed in Joel Abrahamsson's 2012 blog post, Wildcard Queries with Episerver Find, for quite a while. The Episerver Find built-in WildcardQuery has some important advantages. Notably, it provides a means to boost results that have wildcard search hits against a specific field or set of fields. But, in practice, wildcards are only one piece in the puzzle of constructing a good search experience for the user. 

The purpose of this blog post is to address some of the challenges that come up when using WildcardQuery: 

  • Best Bets
  • Multiple Fields
  • Multiple Words
  • Apostrophes

Getting Started

The code block below is the base query that we'll be working with. For the uninitiated, I've taken Joel's extension method and made one key update: asterisks are added to the query string within the method itself.  

public static ITypeSearch<T> WildcardSearch<T>(this ITypeSearch<T> search,
    string query, Expression<Func<T, string>> fieldSelector, double? boost = null)
{
    query = query?.ToLowerInvariant();
    query = WrapInAsterisks(query);

    var fieldName = search.Client.Conventions
        .FieldNameConvention
        .GetFieldNameForAnalyzed(fieldSelector);

    var wildcardQuery = new WildcardQuery(fieldName, query)
    {
        Boost = boost
    };

    return new Search<T, WildcardQuery>(search, context =>
    {
        if (context.RequestBody.Query != null)
        {
            var boolQuery = new BoolQuery();
            boolQuery.Should.Add(context.RequestBody.Query);
            boolQuery.Should.Add(wildcardQuery);
            boolQuery.MinimumNumberShouldMatch = 1;
            context.RequestBody.Query = boolQuery;
        }
        else
        {
            context.RequestBody.Query = wildcardQuery;
        }
    });
}

public static string WrapInAsterisks(string input)
{
    return string.IsNullOrWhiteSpace(input) ? "*" : $"*{input.Trim().Trim('*')}*";
}

In Joel's version, the asterisks were added by the consuming code. But here, if the query "viol" is passed, it will convert it to "*viol*" itself, which will match against both of the words "violin" and "viola". 

This extension method can be called as follows: 

string query = "viol";
double pageNameBoost = 1.5;
var result = SearchClient.Instance.Search<PageData>()
    .WildcardSearch(query, x => x.PageName, pageNameBoost)
    .GetPagesResult();

Best Bets

One of the challenges of using wildcards is getting them to work with Episerver Find's Best Bets. Because wildcard queries use query strings with asterisks, best bets do not match. Consider the following example...

Say we have defined a Best Bet, with the phrases "violin", "viola", and "viol", to a music teacher profile page: "Chen, L.", our primary music teacher for violins and violas. So whenever a user searches for "viol", the Best Bet is found, and the "Chen, L." teacher profile page appear at the top of the results.

But our site requirements also state that search should support partial word matches. Which leads us to use the WildcardSearch method defined above.

This is a problem because Best Bets are not wildcard enabled. Best Bet lookup doesn't treat an asterisk any differently than, say, an "a" or a "3". So when our WildcardSearch() method passes the phrase "*viol*" to Find, the string doesn't match on any Best Bet, and the "Chen, L." teacher profile page does not (necessarily) appear at the top of the results.

Note that the Find admin UI does not permit special characters, so even if we wanted to add a best bet for "*viol*" -- not that we should -- the system wouldn't allow it.

Fortunately, Best Bets can be added by chaining a plain vanilla For() to the search object. In our consuming code: 

string query = "viol";
double pageNameBoost = 1.5;
var result = SearchClient.Instance.Search<PageData>()
    .For(query)
    .InField(x => x.PageName)
    .ApplyBestBets()
    .WildcardSearch(query, x => x.PageName, pageNameBoost)
    .GetPagesResult();

Although repetitive, this works because WildcardSearch() ORs the query generated by For() with the WildcardQuery it uses under the hood. Which is the purpose of BoolQuery and this line: 

boolQuery.Should.Add(context.RequestBody.Query);

InField() ensures that we only search against the field we are passing to WildcardSearch(), and avoid false positives from searching against the built-in All field.

We can tighten up reusability by putting these additional chains into another extension method:

public static ITypeSearch<T> ForWithWildcards<T>(this ITypeSearch<T> search,
    string query, Expression<Func<T, string>> fieldSelector, double? boost = null)
{
    return search
        .For(query)
        .InField(fieldSelector)
        .ApplyBestBets()
        .WildcardSearch(query, fieldSelector, boost);
}

Which would be called by the following code: 

string query = "viol";
double pageNameBoost = 1.5;
var result = SearchClient.Instance.Search<PageData>()
    .ForWithWildcards(query, x => x.PageName, pageNameBoost)
    .GetPagesResult();

I like to keep WildcardSearch() separate from ForWithWildcards() for situations where I need to provide my own sort order instead of sorting by score. Since Best Bets are irrelevant without score, I can spare Find the load of processing the QueryStringQuery created in For().

Side note: When the requirements call for Best Bets to appear at the top of a custom sorted set of results, you can retrieve Best Bets from BestBetRepository. BestBetRepository lives in the EPiServer.Find.Framework.BestBets namespace, and can be injected (or service located) into your consuming service.

Multiple Fields

With some minor refactoring, ForWithWildcards() and WildcardSearch() can accept multiple fields. In C# 7, System.ValueTuple -- which you can install from NuGet -- makes this a trivial effort:

public static ITypeSearch<T> ForWithWildcards<T>(this ITypeSearch<T> search,
    string query, params (Expression<Func<T, string>>, double?)[] fieldSelectors)
{
    return search
            .For(query)
            .InFields(fieldSelectors.Select(x => x.Item1).ToArray())
            .ApplyBestBets()
            .WildcardSearch(query, fieldSelectors);
}

public static ITypeSearch<T> WildcardSearch<T>(this ITypeSearch<T> search,
    string query, params (Expression<Func<T, string>>, double?)[] fieldSelectors)
{
    query = query?.ToLowerInvariant();
    query = WrapInAsterisks(query);

    var wildcardQueries = new List<WildcardQuery>();

    foreach (var fieldSelector in fieldSelectors)
    {
        string fieldName = search.Client.Conventions
            .FieldNameConvention
            .GetFieldNameForAnalyzed(fieldSelector.Item1);

        wildcardQueries.Add(new WildcardQuery(fieldName, query)
        {
            Boost = fieldSelector.Item2
        });
    }

    return new Search<T, WildcardQuery>(search, context =>
    {
        var boolQuery = new BoolQuery();

        if (context.RequestBody.Query != null)
        {
            boolQuery.Should.Add(context.RequestBody.Query);
        }

        foreach (var wildcardQuery in wildcardQueries)
        {
            boolQuery.Should.Add(wildcardQuery);
        }

        boolQuery.MinimumNumberShouldMatch = 1;
        context.RequestBody.Query = boolQuery;
    });
}

The calling code would then look something like this (depending on which fields you want to search against): 

var result = SearchClient.Instance.Search<PageData>()
    .ForWithWildcards("viol", 
        (x => x.PageName, 1.5),
        (x => x.SearchText(), null));

ValueTuple can, of course, be replaced with your own strongly typed class, but I have used it here for brevity.

Multiple Words and Apostrophes

In our example above, we used the query string "viol", which WildcardSearch() mutates into "*viol*". But what if the user searches for, say, "viol lessons"? In the code above, this will become "*viol lessons*", which will not match against "violin" or "viola".

I like to solve this problem by splitting the query string, by whitespace, into an array, and then ORing a separate WildcardQuery per word. This is done in our WildcardSearch()... 

var words = query.Split(new [] { " " }, StringSplitOptions.RemoveEmptyEntries)
    .Select(WrapInAsterisks)
    .ToList();

...

foreach (var word in words)
{
    wildcardQueries.Add(new WildcardQuery(fieldName, word)
    {
        Boost = fieldSelector.Item2
    });
}

Another challenge is presented by apostrophes. The Find (Elasticsearch) standard analyzer interprets apostrophes as whitespace. So the phrase, "Chen's" is indexed as "Chen s". This works with both plurals -- thanks to stemming -- and possessives, but causes trouble with other words that contain apostrophes.

For example, the name "O'Reilly Books" is indexed as "O Reilly Books". This presents a pattern matching issue for our WildcardSearch() -- and Find in general -- because the code above will mutate "O'Reilly Books" into "o'reilly* books*", which Find will then interpret as "o reilly* books*". If the user searches for "O'Reilly", then "O'Whatever" will also appear in the result list.

To address this scenario, I like to convert apostrophes into asterisks. "O'Reilly Books" becomes "o*reilly* books*" (note that there are no spaces in "o*reilly*"). Searches for "O'Reilly Books" do match "O'Reilly", do not match "O'Whatever", and don't interfere with plurals or possessives.

query = query.ToLowerInvariant().Replace('\'', '*');

With multiple words and apostrophes accounted for, the final extension method code is the following: 

public static ITypeSearch<T> ForWithWildcards<T>(this ITypeSearch<T> search,
    string query, params (Expression<Func<T, string>>, double?)[] fieldSelectors)
{
    return search
            .For(query)
            .InFields(fieldSelectors.Select(x => x.Item1).ToArray())
            .ApplyBestBets()
            .WildcardSearch(query, fieldSelectors);
}

public static ITypeSearch<T> WildcardSearch<T>(this ITypeSearch<T> search,
    string query, params (Expression<Func<T, string>>, double?)[] fieldSelectors)
{
    if (string.IsNullOrWhiteSpace(query))
        return search;

    query = query.ToLowerInvariant().Replace('\'', '*');

    var words = query.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
        .Select(WrapInAsterisks)
        .ToList();

    var wildcardQueries = new List<WildcardQuery>();

    foreach (var fieldSelector in fieldSelectors)
    {
        string fieldName = search.Client.Conventions
            .FieldNameConvention
            .GetFieldNameForAnalyzed(fieldSelector.Item1);

        foreach (var word in words)
        {
            wildcardQueries.Add(new WildcardQuery(fieldName, word)
            {
                Boost = fieldSelector.Item2
            });
        }
    }

    return new Search<T, WildcardQuery>(search, context =>
    {
        var boolQuery = new BoolQuery();

        if (context.RequestBody.Query != null)
        {
            boolQuery.Should.Add(context.RequestBody.Query);
        }

        foreach (var wildcardQuery in wildcardQueries)
        {
            boolQuery.Should.Add(wildcardQuery);
        }

        boolQuery.MinimumNumberShouldMatch = 1;
        context.RequestBody.Query = boolQuery;
    });
}

public static string WrapInAsterisks(string input)
{
    return string.IsNullOrWhiteSpace(input) ? "*" : $"*{input.Trim().Trim('*')}*";
}

Enjoy!

Apr 15, 2018

Comments

Apr 15, 2018 09:37 PM

Nice write up, thanks for sharing! It’s worth pointing out that it’s possible to customise how best bets are matched so you may be able to do some customisation around wildcards. I wrote about customising best bets here: https://www.david-tec.com/2017/07/customising-best-bet-behaviour-in-episerver-find/

Henrik Fransas
Henrik Fransas Apr 16, 2018 09:20 AM

Nice write up.

Quick question, is it nessesary to do all those toLowerInvariant, isn't there any way to send in something like ignoreva.... or does Find care about lower or captial letters?

Glenn Lalas
Glenn Lalas Nov 20, 2019 12:48 AM

Just wanted to chime in to say that even a year and a half later this blog post is still super helpful.  Thanks Drew, great stuff!

sheider
sheider Jan 26, 2021 09:29 PM

This is a fantastic post! Thanks Drew!

Ashish Rasal
Ashish Rasal Jun 13, 2021 05:38 AM

Thanks for sharing this most useful post, it helped me to search documents which have hyphen in it's name.

Thanks again.

Nat
Nat Jan 11, 2022 10:45 AM

really helpful post, thanks.

although I am finding that in results, it is now highlighting much more than the actual search term itself, is this expected?

Kaspars Ozols
Kaspars Ozols Aug 15, 2022 09:26 PM

Important thing to note - the wildcard search seems to be case sensitive. 

Adding ".lowercase" postfix to the field name helped to achieve the required result.

 string fieldName = search.Client.Conventions .FieldNameConvention .GetFieldNameForAnalyzed(fieldSelector.Item1) + ".lowercase";

Please login to comment.
Latest blogs
Optimizely SaaS CMS DAM Picker (Interim)

Simplify your Optimizely SaaS CMS workflow with the Interim DAM Picker Chrome extension. Seamlessly integrate your DAM system, streamlining asset...

Andy Blyth | Nov 21, 2024 | Syndicated blog

Optimizely CMS Roadmap

Explore Optimizely CMS's latest roadmap, packed with developer-focused updates. From SaaS speed to Visual Builder enhancements, developer tooling...

Andy Blyth | Nov 21, 2024 | Syndicated blog

Set Default Culture in Optimizely CMS 12

Take control over culture-specific operations like date and time formatting.

Tomas Hensrud Gulla | Nov 15, 2024 | Syndicated blog

I'm running Optimizely CMS on .NET 9!

It works 🎉

Tomas Hensrud Gulla | Nov 12, 2024 | Syndicated blog