I'm assuming this forum is for all EPiServer 7.5+, BLOB era sites - please move if I've put it in the wrong place.
We maintain an EPiServer 7.19 site for a client and do some small enhancement dev work for them. In our most recent release, we added two new page templates and a handful of new blocks. This all worked fine in development, but we hit odd problems when deploying in multi-node environments, where several application servers share a single DB.
Lets say we have 5 application nodes behind a load balancer, sharing a single DB, and we update these by removing nodes from load, updating them, returning them to load etc, so there is no interruption in service. During this node-shuffling, we lock editors out of the CMS at the load balancer level so no new blocks can be created.
After the update, Node 5 can create instances of the new page/block types without problems. However, if you try on nodes 1-4, the UI seems to freeze, with 500 errors seen on the browser console, and the following error appearing in the logs:
2015-12-03 11:58:03,940  ERROR EPiServer.Global: 1.2.5 Unhandled exception in ASP.NETSystem.Data.SqlClient.SqlException (0x80131904): The INSERT statement conflicted with the FOREIGN KEY constraint "FK_tblContent_tblContentType". The conflict occurred in database "ClientSiteName", table "dbo.tblContentType", column 'pkID'.
There is no problem creating instances of older Page/Block types, only the new ones added in this update.
This behaviour has been consistent across 2 node (node 1 broken, node 2 fine) and 5 node (nodes 1-4 broken, node 5 fine) environments - only the last node to be updated can successfully create these new blocks/pages. The only fix we've found is for the final (working) node to create a single instance of each new page/block type, then to run an iisreset on each of the other nodes in turn. Once this is done, they can successfully create instances of these new pages/blocks.
Searching for the constraint, I found this thread, which suggests this is caused by running multiple versions of code against a single DB. We are technically doing this, transiently, during the piecemeal update process, but as these are new page/block types, and editors are locked out during the update process, I'm slightly confused as to why this would cause problems.
Given our requirement to perform site releases without an interruption in service, does anyone have any suggestions how we can avoid encountering this issue? Having to perform extra load juggling and iisresets is considerably stretching our change windows.
Just a guess, but one of the nodes might have deleted the new content type since it could not find the code. It is fairly easy to protect yourself against that using semantic versioning of your DLL's. See http://world.episerver.com/blogs/Per-Bjurstrom/Archive/2012/10/Synchronization-of-typed-models/
Regarding your deployment strategy, some that have the zero downtime requirement actually starts by creating a new database that the upgraded nodes run off. And the end of the upgrade the old database is removed. A bit more work but fail safe since old and new nodes never touch the same DB and you easily abort the upgrade. Downside ofcourse is that any writes to the old nodes is lost.
Ah, I think that might be it. I was confused, as all nodes showed the new types under the Content Types tab in admin. But I've just found that if I choose one of these types on a problem node, then go to Settings, then hit Revert to Defaults then Save, then I get a "Page type deleted" message for all the newly added types. If I then refresh the page, they're gone completely from the list of types.
I'll take a read through the link you've provided and see if I can find a solution that works for our setup, particularly for wholly new types.
Regarding the transient DB, I agree that's a sensible approach, and one we used to use when the only writes to the DB were from editors (who were locked out at the load balancer). However, we recently had to integrate some third-party developed components which need to write to the DB constantly (worse still, three separate DBs which need to be consistent), and our clients are not willing to disable this functionality for the duration of the change window, so it's no longer an option.
I eventually found a simple solution for this, which was pretty easy to integrate into our release procedure:
The offending code that was deleting the new types only runs on EPiServer startup, and it won't delete anything for which there's a concrete example in the DB. So provided the first node to be updated creates one of each of the new types (and no other nodes are restarted during this time), then all subsequent nodes will re-use the type IDs established by the first, rather than deleting them.
Thanks for the pointers!