I ran into some some troubles when i was preparing a new site for production today (from a fresh production db copy), using a _new_ EPiServer Find index.
I ran the scheduled indexed job manually, and this turned up on 35 pages (All of the same page type: News) in the error log (some pages of the same page type had been republished before running this job):
An error occured while indexing (Page): MapperParsingException[Failed to parse [MainIntro$$string]]; nested: MapperParsingException[failed to parse date field [Flera nya MEDDEV-guider utkom i början på året på EU-kommissionens webbplats. Bland andra den efterlängtade guiden om medicinteknisk mjukvara.], tried both date format [dateOptionalTime], and timestamp number]; nested: IllegalArgumentException[Invalid format: "Flera nya MEDDEV-guider utkom i ..."];
I got the same exceptions when running the job again. Just to keed things short I discovered that there seems to be some sort of automatic intital mapping done somewhere (?), based on the value of the field. Another exception was thrown later on to support my theory after some more testing:
An error occured while indexing (Page): MapperParsingException[Failed to parse [PageName$$string]]; nested: MapperParsingException[failed to parse date field [2010-04-14 FDA meddelar tillverkare av produkter för strålbehandling om striktare godkännandeprocesser än tidigare], tried both date format [dateOptionalTime], and timestamp number]; nested: IllegalArgumentException[Invalid format: "2010-04-14 FDA meddelar tillverkare av pro..." is malformed at " FDA meddelar tillverkare av pro..."];
The exception is now thrown for a different field (PageName), the thing I did beforehand, was to create a new page with ex. "2012-08-28" as PageName, indexed in a _empty_ episerver find index. Which then caused other pages to fail, if PageName did not contain a valid date as value.
The index is working just fine now, after deleting the index, and running the scheduled job manually, to get a clean index.
I don't know if my theory is 100% accurate, but there is without a doubt something fishy going on here :)
You're correct, there's automatic mappings going on and this is a tricky case. Fields needs to be mapped as a type in the index so the search engine knows how to parse and index them. Creating mappings is a tedious task if done manually. Luckily elasticsearch has support for automatic mappings using templates the first time it encounters a new field. What template is used can be determined by the type of data in the field and/or naming convention.
What you have run into is a problem caused by the lack of a date format in JSON. I'm guessing that the very first document that was indexed had a PageName that only contained a date. The search engine then recognized it as a date, as a string with a date is how a date is represented in JSON, and mapped the PageName field as date. When other documents with non-date-only PageNames were indexed they failed as the search engine was expecting them to contain dates.
At this point in time this is a limitation in Find, although I definitely think it could be alleviated in the future.
Just as I thougt then, good to know. I guess it isn't a very big problem, as long as the initial indexing is done correctly. Now i know what to do when the logfiles are flooded with MapperParsingException, and hopefully a few others by reading this. Thank you for your response!
I'm currently experience a similar issue, seemingly not related to a date, but to an array:
EPiServer.Find.Cms.PageIndexer.ReIndex - An error occured while indexing (Page): MapperParsingException[object mapping for [tbody] tried to parse as array, but got EOF, has a concrete value been provided to it?].
I'm currently cannot see what page is throwing up, but it seems likely that it is the Home page (7 errors - 7 pages)
So my question is, how do we get around this issue? Is there a way to do this without clearing the index completely or is this our best option?
I had to clear the index in order to sort out the mapping exceptions. Another solution might be to delete all the items of the specific type that fails, i didn't try that.
And another thing:
I forgot to point out that you are unable to see which page/folder that throws an error, as Henrik said. It would be nice if this was included in the logging, I had some issues with some page files not getting indexed last week, because of some bandwidth/file size limitations, and was unable to see which page caused the error. It can be (and was) pretty difficult and time consuming to locate the source of the error.
Well, this being the Home pages, I guess that clearing the index will be my best option in my case.
Unfortunately clearing the index didn't resolve it for us, and my bet on it being the homepage seems wrong. Is there any way to get information about what pages that are failing or at least what type mapping fields are setup as?
I compared the type statisitcs in the index with the actual number of pages in EPiServer, to determine which type (PageType) failed. I guess another solution is to basically build your own simple loop that manually index each page, together with some logging. Note that the cause of this error is the values of the fields, not the data type of the field, at least in my case.