SQL replication and EPiServer CMS
I’ve been spending some time setting up an EPiServer CMS site using SQL replication and will in this post go through some basics, but lets start with the reasons for choosing replication in your architecture. The two obvious reasons are security and scalability. Security because you can have a master database that has zero surface to the internet, it just pushes changes to the replica. Scalability because you have have the visitors and editors on different database servers, you can also have multiple database servers for visitors. The main reason for staying away from replication is complexity, do understand that this setup will require a lot more configuration and maintenance than a “normal” setup, but in some cases the benefits are worth it.
This configuration was done using EPiServer CMS 5 R2 SP2, Microsoft Windows Server 2008, Microsoft SQL Server 2008.
Note: An alternative to SQL replication is EPiServer CMS Mirroring that runs on a higher abstraction level by publishing pages using the API, which means you have more control over what gets mirrored in the application but less control over dependencies such as custom tables (without writing code).
I will not describe in detail how you install and configure SQL replication since that would probably constitute a 100 page book, I am focusing on the configuration related to CMS. The initial approach is to install two copies of your site on two different servers using two different database servers with the replication feature installed. You now have two identical copies of the site. You also need to figure out how to share uploaded files by for example have a shared disk/SAN or use file synchronization.
Preparing the database
There are some changes needed on the schema for “object store”, a technology that is being used by for example XForms, permanent links and versioned files.
Make these changes:
- tblIndex* tables needs primary key on fkSchemaItemId+fkItemId
- tblRelation needs primary key on FromId,ToId
The main area of configuration when you setup replication will be the publication on the source database which describes what objects in the database that should be published.
I’ve included everything but..
These tables are used for syncing group membership for logged in users so I’ve excluded them:
tblMasterSlave is used to store master/slave licenses so it should be excluded unless you plan to have the master and slave on different sides of the replication (which I honestly don’t know if its supported anyway).
Scheduled jobs will not work correctly when replicated so I’ve excluded them (you either don’t have any scheduled jobs on the replica or have a separate configuration):
These tables can normally be excluded, keep them if you are uncertain:
- tblPageSoftlink (links to other pages in the editor for example)
- tblPlugIn (if you don't have plug-in settings that must be propagated)
- tblTask (tasks are normally only used by editors)
If you use replication your destinations are mostly read-only which means that replicating versions make no sense unless you call an API that require the existence of versions. For this to work in EPiServer CMS 5 R2 SP2 you need a database hot fix available in the downloads sections since two stored procedures has that dependency. I suggest that keep these tables in your initial testing and then remove them when you have everything working:
Also exclude these since they are “state”, this also brings up another question regarding the membership provider. You cannot have the master for the membership provider on the source database since the replicas will need to update these tables:
- tblTempKeyword (used as a temp table while indexing keywords)
- Aspnet_* (everything asp.net related)
- CompletedScope (Workflow)
- InstanceState (Workflow)
I also decided to remove these stored procedures from the destination and from the publication just to make sure there was no way to publish any content, this is of course optional as well and only a security measure.
EPiServer has an event management system that is more known as “Remote Events”. This system provides a mechanism for distributing events within an EPiServer CMS site, between EPiServer CMS sites on the same physical server, and between EPiServer CMS 5 sites on separate servers.
This communication is mainly used to expire items in the cache but some other feature take advantage of this platform, for example EPiServer CMS 6 which support stoppable scheduled jobs which use this channel to tell all servers to terminate the job if they are running it.
The technology behind the actual communication channel is WCF and our implementation supports both UPD broadcasts and TCP/IP direct channels. If you want to get into the details have a look at the document Event Management System Specification.
The problem here is that SQL replication is not instant, under normal operations it takes a few seconds for the changes in the database to appear on the remote server but the events are by design instant. The solution to this problem is to have events written to the database instead and have them replicated over to the target and then have the target site site poll that table for new events.
I’ve written a plug-in for the event system that does just that, see the next section on how to install it.
Installation of “Database Events”
The actual binaries can be found in the download sections.
- Run the Database.sql script to install the required table tblRemoteEvents (make sure you include it in the publication)
- Install the DLL's in the bin-directory on both sites:
- Add this block to the <modules>-section (or httpModules if you are on IIS6) on all sites involved to trigger the initialization:
<add name="DatabaseEventsModule" type="EPiServer.Events.Database.RemoteEventsDatabaseModule, EPiServer.Events.Database" preCondition="managedHandler"></add>
- Set enableRemoteEvents + enableEvents to true in site settings to enable the remote events system
- Important: Make sure there are no end points in web.config defined for service EventReplication, it is possible to combine database events with the WCF events but a site cannot receive events from both sources. It is very important that you verify this because otherwise the same event will be delivered twice causing “unexpected behaviors”.
- If you have a single site on the source database you can disable the subscriber to events since there is nothing to read anyway: <appSettings>
- On the sites sitting on the replica you should disable the publisher since they are only read-only and you cannot write to the events table. If you have a solution that require the sites on the replica to distribute events between each other you could use the standard WCF approach to enable those events as well to be distributed between the sites as long as you don’t connect a site sitting on the source database (I can’t think of a situation were you might need this right now but it should be possible if needed). <appSettings>
- Test the functionality by publishing some changes to a page and see it appear on the other site within 20 seconds.
<add key="EPiServer.Events.Database.DisableSubscriber" value="true"/>
<add key="EPiServer.Events.Database.DisablePublisher" value="true"/>
As always you enable logging in EPiServer to track problems, there are logging in this plug-in as well. All events are written to tblRemoteEvents, you should see rows in that table as events fire away.
I’ve done some testing on the setup by stressing the system using the following measures (at the same time of course):
- Content Generator generating 50-100 pages per second
- Applied load using VS Load Test on the web site for the target database, averaged on 40% CPU.
- Manually created pages, deleted pages, emptied wastebasket etc
- Verified that cache expirations worked (new pages appeared within 20 seconds etc)
Overall the performance of replication was good, I did not see more than a few seconds delay before content reached the target. Since replication sets some restrictions on what the target system can do with the database you need to test the functionality that you will be using to make sure it works okay (you can’t have inserts on the target on replicated tables).
SQL replication is not officially supported by EPiServer which basically means that we do not make any tests to make sure every possible scenario works from version to version, this is something you need to do yourself and it may limit you to always stay on the latest version. We will fix any bugs or problems related to replication if possible to make sure it is an option if want to tweak your setup. We will change and add new features to the database that you need to handle when upgrading (determine what should be replicated and what should not). If you make any changes to our schema you need to make sure you remove those before upgrading to a new version since our scripts only knows about how to upgrade our version of the schema, not yours.
Please, let me if you got any feedback or I left out something important. If you think that replication is very important scenario you should say so in the comments, it may help convincing the product owners ;-)