Tell Find to skip stemming for certain words?

David Tellander

Vote:

expand_less 0 expand_more

Hello,

we have a problem where the stemming of certain words (swedish) causes search results with a huge amount of false positives. An example is the word demens (dementia) which gets stemmed to dem (them), which is a pretty common word, totally unrelated to dementia. So insignificant in fact that it should be considered a stop word. I know that some Lucene stemmers support exclusion of words via keywords. But is there a way to tell Find this?

/David

#175457

Edited, Feb 21, 2017 10:02

Motti Rapoport

Vote:

expand_less 0 expand_more

Hi,

We also have this same issue. Any answer or solution to this? Maybe a way to disable stemming without disabling language support?

In this blog post Viktor Sahlström asks users and partners to help you guys with a list of exceptions for the Find stemming, to keep it updated.

Aside from "dem" which David mentions above we have also found huge amounts of false positives with "ska", "era" and "för" to add to the list.

/Motti

#175755

Edited, Mar 01, 2017 10:42

David Tellander

Vote:

expand_less 0 expand_more

I left a comment on Viktor's post now, referencing this thread. This is a tricky issue since both 'ska' (the music style) and 'era' (time span) could be perfectly valid results depending on context, just like the 'banan' example in Viktor's post.

#175760

Mar 01, 2017 12:00

Motti Rapoport

Vote:

expand_less 0 expand_more

We managed to disable stemming and retain language support by using Language.None as a parameter for UnifiedSearch and setting ContentLanguage.PreferredCulture to whatever language Find should get content for.

#175901

Mar 06, 2017 10:44

Henrik Fransas

Vote:

expand_less 0 expand_more

@david, I have added a feature request for simular stuff

http://world.episerver.com/forum/developer-forum/Feature-requests/Thread-Container/2017/1/be-able-to-filter-out-stopwords-for-all-search-not-only-morelike/

#175904

Mar 06, 2017 11:09