Don't miss out Virtual Happy Hour this Friday (April 26).

Try our conversational search powered by Generative AI!

Fuzzy search: MatchFuzzy vs FuzzyQuery

Vote:
 

Hi,

I'm trying to utilize fuzzy queries and suggest user correct words when search term typed incorrectly. I built a test index which has words "excursion", "prioritized", and "travel" in the title (the field name in index is "Title$$string") and trying to run the followin tests:

        [TestCase("excrusion")]
        [TestCase("excrusio")]
        [TestCase("excursio")]
        [TestCase("Piroritized")]
        [TestCase("piroritized")]
        [TestCase("prior")]
        [TestCase("Prioritized")]
        [TestCase("infromation")]
        [TestCase("*Piroritized*")]
        public void Suggest_MatchFuzzy(string input)
        {
            var q = Client.Search()
                .Filter(x => x.Title.MatchFuzzy(input))
                .WithLogToConsole("fuzzy-suggest-epifind-match-fuzzy", input);

            var result = q.GetResult().ToList().LogTitlesToConsole();

            result.Count.Should().BePositive();
        }

it appears that only 1 query returns non-empty results which looks incorrect. The query sent to api is:

{
  "query": {
    "constant_score": {
      "filter": {
        "query": {
          "fuzzy": {
            "Title$$string": {
              "value": "infromation"
            }
          }
        }
      }
    }
  }
}

My second attempt was to use an extension similar to http://joelabrahamsson.com/wildcard-queries-with-episerver-find/: 

        public static ITypeSearch FuzzyFilter(
            this ITypeSearch search,
            Expression> fieldSelector,
            string almost,
            double? minSimilarity = null,
            double? boost = null)
        {
            var fieldName = search.Client.Conventions
                .FieldNameConvention
                .GetFieldNameForAnalyzed(fieldSelector);
            
            var wildcardQuery = new FuzzyQuery(fieldName, almost.ToLowerInvariant())
            {
                MinSimilarity = minSimilarity,
                Boost = boost.GetValueOrDefault(1)
            };

            //Add it to the search request body
            return new Search(search, context =>
            {
                if (context.RequestBody.Query != null)
                {
                    var boolQuery = new BoolQuery();
                    boolQuery.Should.Add(context.RequestBody.Query);
                    boolQuery.Should.Add(wildcardQuery);
                    boolQuery.MinimumNumberShouldMatch = 1;
                    context.RequestBody.Query = boolQuery;
                }
                else
                {
                    context.RequestBody.Query = wildcardQuery;
                }
            });
        }

and the test is:

        [TestCase("excrusion")]
        [TestCase("excrusio")]
        [TestCase("excursio")]
        [TestCase("Piroritized")]
        [TestCase("piroritized")]
        [TestCase("prior")]
        [TestCase("Prioritized")]
        [TestCase("infromation")]
        [TestCase("*Piroritized*")]
        public void Suggest_FuzzyFilter(string input)
        {
            var q = Client.Search()
                .FuzzyFilter(x => x.Title, input)
                .WithLogToConsole("fuzzy-suggest-all-but-ports-fuzzy-filter", input);

            var result = q.GetResult().ToList().LogTitlesToConsole();

            result.Count.Should().BePositive();
        }

which returns results for all inputs except "excrusio" which is probably correct. The query sent to server is:

{
  "query": {
    "fuzzy": {
      "Title$$string.standard": {
        "value": "infromation",
        "boost": 1.0
      }
    }
  }
}

Then I tried to play with minSimilarity but the results were the same. 

So the question number 1 - is how a correct query should look like?

and question number two is more specific: in the query above (which looks almost good for me) I see "Title$$string.standard" which probably means "standard" analyzer. Are there a quick way to fix it and give me a better control by minSimilarity field? 

#132745
Aug 18, 2015 16:21
Vote:
 

For some reason collapsed code blocks are not shown in my original post (Chrome), so duplicating them here:

The query sent to api by .Filter(x => x.Title.MatchFuzzy(input))

{
  "query": {
    "constant_score": {
      "filter": {
        "query": {
          "fuzzy": {
            "Title$$string": {
              "value": "infromation"
            }
          }
        }
      }
    }
  }
}

The second test which uses .FuzzyFilter extension is:

        [TestCase("excrusion")]
        [TestCase("excrusio")]
        [TestCase("excursio")]
        [TestCase("Piroritized")]
        [TestCase("piroritized")]
        [TestCase("prior")]
        [TestCase("Prioritized")]
        [TestCase("infromation")]
        [TestCase("*Piroritized*")]
        public void Suggest_MatchFuzzy(string input)
        {
            var q = Client.Search<InformationIndexModel>()
                .Filter(x => x.Title.MatchFuzzy(input))
                .WithLogToConsole("fuzzy-suggest-epifind-match-fuzzy", input);

            var result = q.GetResult().ToList().LogTitlesToConsole();

            result.Count.Should().BePositive();
        }

And finally query sent to api by this extension is:

{
  "query": {
    "fuzzy": {
      "Title$$string.standard": {
        "value": "infromation",
        "boost": 1.0
      }
    }
  }
}
#132746
Edited, Aug 18, 2015 16:27
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.