I have a Crawler connector where I've updated the Start URL from domain1.se to domain2.se. Do I need to clear the index by myself or will the crawl process clear the previously indexed domain1 items once a new crawl on domain2 completes?
While crawl is running it looks like items from domain1 are still in untouched in index while items from domain2 are being added.
I guess I will find out later today but could save some time by clearing manually right away if that's what's needed.
Please let us know the final result. I vote you will need to clear it manually.
It actually looks like the crawler's previously indexed items were removed once the new run with the same crawler (but with updated domain name) finished.
This is similiar to how SiteSeeker crawling operated.
While running there were new items added from the new crawl and but the old items were still there.
This is not how SiteSeeker crawling operated.
I think this is the place where the docs would clarify this but doesn't:http://webhelp.episerver.com/latest/find/adding-connectors.htm
The _id of a WebContent document in Find is a hash of the URL. In a standard case when scheduled indexings are performed with the same (or almost the same) settings every time, old variants of crawled documents will be over written during the indexing. As Johan noticed and anticipated, because of the changed host, he did end up with a duplicated index.. for a while.Each WebContent document has a session id. This id is tied to the crawl when it was fetched. At the end of each crawl the connector will do a delete that remove fetched pages that was not fetched the last crawl. This is done so removed pages from the crawled web site also disappears from the index after the crawl.