Vulnerability in EPiServer.Forms
we have a problem where the stemming of certain words (swedish) causes search results with a huge amount of false positives. An example is the word demens (dementia) which gets stemmed to dem (them), which is a pretty common word, totally unrelated to dementia. So insignificant in fact that it should be considered a stop word. I know that some Lucene stemmers support exclusion of words via keywords. But is there a way to tell Find this?
We also have this same issue. Any answer or solution to this? Maybe a way to disable stemming without disabling language support?
In this blog post Viktor Sahlström asks users and partners to help you guys with a list of exceptions for the Find stemming, to keep it updated.
Aside from "dem" which David mentions above we have also found huge amounts of false positives with "ska", "era" and "för" to add to the list.
I left a comment on Viktor's post now, referencing this thread. This is a tricky issue since both 'ska' (the music style) and 'era' (time span) could be perfectly valid results depending on context, just like the 'banan' example in Viktor's post.
We managed to disable stemming and retain language support by using Language.None as a parameter for UnifiedSearch and setting ContentLanguage.PreferredCulture to whatever language Find should get content for.
@david, I have added a feature request for simular stuff