Crawler connector: ignore session id

London Dev Meetup Rescheduled! Due to unavoidable reasons, the event has been moved to 21st May. Speakers remain the same—any changes will be communicated. Seats are limited—register here to secure your spot!

AI OnAI Off

Home / Forums / Developers (Apps and Integrations) Forum / Search & Navigation /

Aziz Khakulov

Vote:

I've set up a crawler connector in EPiServer Find which seems to create duplicates in the index. When I investigated I've found out that the URLs differ only by JSESSIONID in URL. For example:

http://partille.tromanpublik.se/viewPerson.jsf;jsessionid=823DB252BC63FFA19437786FE7740217?id=93

http://partille.tromanpublik.se/viewPerson.jsf;jsessionid=B602838BA33CA65D0E88B27D4C06E1A5?id=93

Not that the query string is the same and the links lead to the same page (can easily be tested in browser). I cannot find any way to configure the connector to ignore this jsessionid thing. Anyone did this before? Can Find filter them?

#161199

Oct 07, 2016 15:41

Johan Kronberg

Vote:

The crawler is very limited in terms of configuration. Our choice has been to identify the crawler on the target site and hide markup that's not relevant to crawl. You should be able to solve your issue on that side as well I assume.

We've asked Epi to look at what was in SiteSeeker for crawling configuration. It was pretty complete in my opinion and had some options for querystring-keys that would've fixed your issue.

#162166

Oct 11, 2016 22:51

Please login to post a reply