Geographically scaling an Episerver site over multiple Azure regions
There are many reasons why you might want to run your site out of multiple regions – performance, SEO, disaster recovery, legislative or even just preference. In our current era of dev-ops and cloud technology, it’s become almost trivial to simultaneously deploy a codebase to servers around the world so you’d be forgiven for thinking that deploying an Episerver site to several global locations would be just as simple but there’s a catch. Behind every modern Episerver site is a SQL server database and, for all its plus points (and there are many), geographical distribution isn’t really SQL server’s forté. This means that you’re basically stuck with your database being hosted in a single location and, because your web servers need the fastest access they can to the DB, that also means that your web servers are limited to that same location.
Azure SQL to the rescue?
Unwilling to accept that limitation, I thought I’d take a look at Azure SQL to see if I could find a workaround and, though it shares a common heritage with SQL server, Azure SQL does have a couple of features which look like they may be of use.
First up, “SQL Data Sync”. This feature is mainly for managing hybrid infrastructure so it’s really targeted at DBAs used to setting up SQL server replication but what it gives us is multiple writeable clones of our database which are magically sync’d by a separate task. But that’s where the good news stops. If you take a look at the FAQs, alarm bells should start to ring. First of all, you need to tell it what to sync down to a column level in your database tables which, aside from taking forever to set up, is likely to cause problems in the future should the schema change as a result of upgrades, etc. There’s also the issue of timings. The synchronisation task happens on a schedule with a frequency between 5 minutes and 30 days which means that, even if you run the sync as frequently as possible, it may be 5 minutes before a change is synchronised and even longer before it would be visible on the site due to the caching within the Episerver DataFactory. In summary, you could probably make this route work but you’d most likely spend the remainder of your life wishing you hadn’t.
Our other option is to use geo-replication. While this feature was primarily built for disaster recovery scenarios, it is much closer to what we need in that the synchronisation is a continuous process performed as the data changes rather than on a schedule. It’s also worth mentioning at this point that geo-replication in Azure SQL is really easy to set up. You basically click “Geo-Replication” on your database in Azure Portal, click on the region you want to replicate your database to and choose the characteristics of the database then just sit back and wait while Azure handles the rest.
But there’s a catch. Because this mechanism was built for DR/failover scenarios, we’re left with a fully functioning primary database but our replicas are read-only. That said, as long as you don’t need to write to the Episerver Database from your remote sites, you can simply switch your site into read-only mode.
So far, so good but what happens when you publish content in the CMS? Well, as you’ve only got 1 writable database, all of your editing would need to happen in your primary region. When you hit publish, the status of the piece of content is updated in the database and the publish event is fired. The remote site picks up that event and clears the relevant record from its cache to be rebuilt on the next request which would be fine if the database synchronisation completes before the event is fired but it doesn’t. The synchronisation happens asynchronously so there’s a slight delay between the primary returning to say it’s updated and the replica being updated. This delay is typically less than a few seconds but is just enough to cause us problems. If someone requests the updated page on the remote site between the primary updating and the replica receiving the update, the cache on the remote site will be rebuilt with the stale data and our change won’t be visible until that cache is cleared again so what we ideally need to do is to hold off on clearing the remote cache until the data has been replicated.
Delayed events
Just to recap on the remote events process as it stands for sites on Azure – an event is raised by one instance of the site which adds details of the event to a message sent to a topic within Azure Service Bus. All instances of the site are subscribed to the topic and so each one receives and acts on the message. If we want to add in a delay for some subscribers, our easiest route is to create a second topic which the remote instances can subscribe to and pass messages from one to the other after the delay has passed.
Given that it there’s no way to know for sure exactly when the relevant data will have been synchronised, we’re going to have to make an assumption as to how long this sync will take and delay the remote events to the remote server by that amount. There are a few ways we could go about doing this (custom event provider, Azure functions, Logic app, etc.) but I like to keep things simple and reuse as much existing functionality as we can.
As luck would have it, Azure Service Bus has some features which will help us with this, namely auto-forwarding, filter rules and message scheduling. By using these, we shouldn’t need any additional code or services to monitor, wait and forward messages as the service bus should be able to handle this all for us.
To get this up and running, I’ve created an initialisation module which sets up the topics (if they don’t exist) and adds a subscriber which will pick up messages and schedule the delivery of each message into the new topic with a slight delay (10 seconds).
[ModuleDependency(typeof(EPiServer.Web.InitializationModule))]
public class ServiceBusGeoInitialisation : IInitializableModule
{
public void Initialize(InitializationEngine context)
{
var connStr = ConfigurationManager.ConnectionStrings["EPiServerAzureEvents"].ConnectionString;
var topic = ConfigurationManager.AppSettings["LocalTopicName"] ?? "mysiteevents";
var geoTopic = ConfigurationManager.AppSettings["DelayedTopicName"] ?? "mysiteeventsdelayed";
var nsMgr = Microsoft.ServiceBus.NamespaceManager.CreateFromConnectionString(connStr);
//Create our local topic if it doesn't already exist
if (!nsMgr.TopicExists(topic))
{
nsMgr.CreateTopic(topic);
}
//Create our new topic if it doesn't already exist
if (!nsMgr.TopicExists(geoTopic))
{
nsMgr.CreateTopic(geoTopic);
}
//Create a new subscription to forward messages from our local topic to our new topic with a delay
if (!nsMgr.SubscriptionExists(topic, "GeoForwarder"))
{
var subscription = new SubscriptionDescription(topic, "GeoForwarder")
{
ForwardTo = geoTopic
};
nsMgr.CreateSubscription(subscription);
var subscriptionClient = SubscriptionClient.CreateFromConnectionString(connStr, topic, "GeoForwarder");
//Clear out default rule
subscriptionClient.RemoveRule(RuleDescription.DefaultRuleName);
//Set up a new rule to match all messages and schedule for 10 seconds time
var rule = new RuleDescription
{
Name = "GeoDelayRule",
Filter = new SqlFilter("1 = 1"),
Action = new SqlRuleAction("SET sys.TimeToLive = '00:00:10'; SET sys.ScheduledEnqueueTimeUtc = sys.ExpiresAtUtc; SET sys.TimeToLive = '00:30:00'")
};
subscriptionClient.AddRule(rule);
}
}
public void Uninitialize(InitializationEngine context)
{
}
}
For those wondering what’s going on in the filter rule, it would appear that Azure service bus SQL syntax doesn’t support any kind of date manipulation functions so I’ve had to find a workaround. In this instance I’m setting the TimeToLive value on the message to 10 seconds which automatically sets the ExpiresAtUtc value. I can then take the value of ExpiresAtUtc and use it to set the ScheduledEnqueueTimeUtc field before resetting the ExpiresAtUtc value back to 30 minutes. Crazy to have to do it that way but it works.
So there you have it. One initialisation module and a bit of config and we’ve got a working Episerver site running happily in multiple Azure regions. There are, of course, other options available which I’ll aim to cover off in a subsequent post but hopefully you’ve found this useful.
As always, the code posted here is simply a proof-of-concept rather than tested, production ready code so use with caution.
Great Article. Five Star. Thanks for sharing...
Brilliant Post. Thank you so much for sharing, Paul
Brilliant Post. Thank you so much for sharing, Paul
Brilliant Post. Thank you so much for sharing, Paul
Brilliant Post. Thank you so much for sharing, Paul
woah. every time i try to delete the duplicate, it adds another one! Oh well, you deserve 4x my original praise!
Ha ha. Thanks.