We are seeing some pages being indexed by our search crawler with what looks like the output cache parameters appended to the URL. For instance the following URL is indexed: http://www.northshore.org/kellogg-cancer-center/specialties/prostate-cancer-symptoms/?id=&epslanguage=en
Notice the querystring params. Could you tell me why this happens? It seems to be completely random, i.e. a few pages are indexed like this and most are not. They are all the same template. Why would our crawler pick up a URL like the one above? How can I make it not? :-) It should just be: http://www.northshore.org/kellogg-cancer-center/specialties/prostate-cancer-symptoms/
Our config section is thus (let me know if other information would be pertinent):
httpCacheVaryByCustom="path" httpCacheVaryByParams="id,epslanguage" httpCacheExpiration="00:00:00" httpCacheability="Public" pageCacheSlidingExpiration="12:00:00" remotePageCacheSlidingExpiration="02:00:00"
I thought the config we have meant that output caching was turned off. I have never seen a URL like the one above while browsing our site so would like to know reasons why our crawler might have picked it up.
The reason this is an issue is because pages are identified by their URL in our search engine so we need to ensure that pages are indexed with the same URL all the time.
The error is probably generated by an editor, who have inserted a link with those parameters. Maybe a copy and paste error.
The parameters have nothing to do with the output cache. They are standard parameters by EPiServer and if you disable the friendly url rewriter you will see that all links and urls have those parameters.
The settings you are referring to are for which parameters the cache should vary.
Thanks for the reply.
Had a hunch that might be the issue.
Now, how to find this content.....
Search for it ;) Which search engine are you using?
ha, yes, that would be the obvious thing to do :-)
sloshing through the results now.