We had an incident last week where, after an IIS App Pool recycle at night, the EPiServer initialization phase failed (and every following requests) with the following error:
The DELETE statement conflicted with the REFERENCE constraint "FK_tblContentProperty_tblPropertyDefinition". table "dbo.tblContentProperty", column 'fkPropertyDefinitionID'.
After some digging we realized this was happening due to the fact that EPiServer had found a specific content type (a page type) that was no longer defined in code but existed in the database, and it appeared not to be used anywhere, so it tried to do some clean-up by running the netContentTypeDelete stored procedure. And this is where the above constraint failed.
Question is, how did it succeed to go through all the usage validation before coming down to the delete statements? When looking in the database I found something strange when running this statement:
SELECT * FROM tblContentProperty WHERE fkPropertyDefinitionID IN (SELECT pkID FROM tblPropertyDefinition WHERE fkContentTypeID = XYZ)
The tblContentProperty table still had records refering to property definitions (in tblPropertyDefinition) defined to be used for the content type XYZ, i.e. the content type seemingly not being in use and was about to be deleted by EPiServer. Shouldn't this be detected in validation steps earlier? Furthermore, when examining those "ghost records", I noticed that their fkContentID referred to content of a totally different, unrelated, content type. How can this be, should it even be possible? I probably need some expert help from someone with good knowledge about the EPiServer database here... :)Anyways, I had to manually delete those records that still existed (plus some corresponding records in tblWorkContentProperty) in order to get the site up and running again. Now, the more interesting part is how did this happen and what triggered this now? The old content type that it was trying to delete had not been worked with for a month or so, with several other deployments afterwards.
We are speculating that having a parallel staging site (with different binaries and possibly other content types) running against the same EPiServer database might have led up to this, somehow... Could it be? Not a good setup I know, probably we should have a dedicated staging site database with some replication stuff if necessary.
Another thing we thought of was that an existing content type class had been renamed whilst a new one had been created with the same name as the previous one. Possible this could have gotten EPiServer confused? Note that we have never had any duplicate content type GUIDs, so it's not that easy.
We greatly appreciate any help or clues on this one, because we sure don't want this to happen again!
Yeah... It's problematic to run separate code versions against the same database.
It's wise to use a version number on your assemblies, then the later number's type registration will take precedence.
I also recommend never removing content types from code. Just set them to not display.
If you have problems from running different versions against the same db you can escape some of them by clearing the ModelType column for the row in tblContentType.