November Happy Hour will be moved to Thursday December 5th.

Crawler connector: ignore session id

Vote:
 

I've set up a crawler connector in EPiServer Find which seems to create duplicates in the index. When I investigated I've found out that the URLs differ only by JSESSIONID in URL. For example:

http://partille.tromanpublik.se/viewPerson.jsf;jsessionid=823DB252BC63FFA19437786FE7740217?id=93

http://partille.tromanpublik.se/viewPerson.jsf;jsessionid=B602838BA33CA65D0E88B27D4C06E1A5?id=93

Not that the query string is the same and the links lead to the same page (can easily be tested in browser). I cannot find any way to configure the connector to ignore this jsessionid thing. Anyone did this before? Can Find filter them?

#161199
Oct 07, 2016 15:41
Vote:
 

The crawler is very limited in terms of configuration. Our choice has been to identify the crawler on the target site and hide markup that's not relevant to crawl. You should be able to solve your issue on that side as well I assume.

We've asked Epi to look at what was in SiteSeeker for crawling configuration. It was pretty complete in my opinion and had some options for querystring-keys that would've fixed your issue.

#162166
Oct 11, 2016 22:51
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.