November Happy Hour will be moved to Thursday December 5th.

Searching phone numbers

Vote:
 

Im having some trouble searching phone numbers in our Find index. Im getting what i would consider alot of false positives and even if i get a hit on the phone number it often not in the top of the results. As an example, if i search for "08-12345" i would get hits on alot of other documents that only contain the "08"-part. It  seems that the "08" and the "12345" parts is searched as separate words.

I realized that the "-"-character is a reserved character in Lucene and if i would want to create a search containing that charactert i would probably need to escape it as described in http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Escaping%20Special%20Characters

When i search using UnifiedSearchFor() the search string is parsed using EPiServer.Find.QueryEscaping.Quote() which seems to escape all the reserved characters though. So adding it myself shouldent be neccery. Still... the search result is less then perfect.

Am i misinterpreting the lucene documentation or shouldent "08-12345" be matched as if it was one word when the "-"-character is escaped?

 

#77104
Nov 08, 2013 13:59
Vote:
 

I migh be wrong here, but I think Lucene will divide the term into two separate words when indexing that string. "08-12345" will be indexed as "08" and "12345". But then you would think searching for 08\-12345 would give the phone number as the highest ranked search hit.

#77117
Nov 09, 2013 1:25
Vote:
 

Hi Andreas,

What I usually do when faced with these "known pattern"-queries (I know how some pattern should be interpreted, in this case a phone number shouldn't be tokenized into 08,12345 but kept as 08-12345) is to simply encapsulate them in phrases (i.e. don't let lucene decide what to break tokens on). So in this case I would simply scan the query and escape known patterns with ":

searchResult = client.Search<BlogPost>()
.For("some other query tokens \"08-12345\"")
.GetResult();

/Henrik

#77402
Nov 18, 2013 16:13
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.