Web application warmup during DXC Service deployments
When deploying through the DXC Service Management Portal, warmup is both configured (when needed) and tested automatically as a part of the deployment. Since it's usually the most time-consuming part of a deployment, this blog post is aimed at providing a few more details on what's going on at this step and provide some information on how it can be done as quickly as possible.
How is warm up configured for sites in DXC Service?
During the first part of the deployment where code is copied and transformations are applied, the deployment engine will check if an applicationInitialization-section exists in Web.config, if it doesn't, it will be automatically created (more information about this is available here). Because of the way Application Initialization works on the underlying web server, some further configuration related to rewrite rules and redirects will be applied if needed as well to ensure that the warmup requests will be as effective as possible (a redirect response will be accepted in the same way as a 200 OK response by application initialization (and not "followed"), which means that a simple http to https redirect could render the whole application initialization configuration almost useless, therefor, rewrite rules are configured so that the application initialization requests are excluded from any redirects).
Once the "code copy"-part of the deployment has finished, the warmup process will begin.
What is happening during the warm up step?
Autoscaling is first temporarily disabled before any warmup starts (making the current number of instances of the web app "fixed") to avoid any unnecessary scaleouts that could otherwise be triggered by the warmup process. After that the first part of a Azure web app "swap with preview" is initiated which ensures that the deployment slot created during the deployment process will be ready to receive production traffic after the swap is completed and avoid any restarts of the web app.
Once that has finished the application initialization process will begin automatically within each individual instance of the web application. On top of this, the deployment engine will also perform a couple of checks to try to validate that this process works as expected. First, each individual instance's local cache status is verified to ensure the best possible performance and stability of the web app once it's swapped.
The application initialization section is also analyzed to find a suitable hostname that can be used to poll the web app and its instances in the slot and check its response (whenever possible, a feature called routing rules is utilized to be able to use a "real" hostname to reach the deployment slot). Some of the things analyzed in the response are:
- If the header "X-AppInit-WarmingUp" is returned, it means that the Application Initialization engine is still running
- Which web app instance that sent the response, if the response doesn't come from the same instance as the request was sent to, it indicates that the load balancer in Azure has redirected the request because the Application Initialization engine is still running
- If the request times out, or if the instance responds with errors, it could potentially indicate that the Application Initialization engine is still running and therefor the process will keep polling the site until the response changes or if a timeout value is met
Once all instances of the web app have been validated successfully (or if the timeout value has been met), the deployment will continue with re-enabling autoscaling and allow the site to be swapped after a manual validation.
What can be done to speed this up?
Local cache
When it comes to local cache, the only thing that can be done to make this as quick as possible is to keep the web app file size as small as possible, it's otherwise an internal process in Azure that could potentially be affected by for example network congestion in the underlying infrastructure.
Local cache can sometimes take only a few seconds to be ready, but it could also take several minutes. There is no guarantee that it will be ready on all instances at the same time, which is why the deployment engine will check all of them individually.
Application Initialization
The amount of time the application initialization process takes is highly dependent on the site and how quickly it starts, it's also affected by the number of links that have been added to the applicationInitialization-section of course, especially if each page takes a significant time to load (it's run sequentially).
Configuring a minimal applicationInitialization-section to save time during deployment is of course not recommended since that could cause new instances to be slow in production instead, but it could of course potentially be a balance between being able to scale out faster or making sure that every single page on the web app has been warmed up.
The logic used to create this section automatically during deployments have been tried and tested over thousands of deployments and has the primary goal of making sure that scale outs and potential restarts of instances will be as seamless as possible, so it's not recommended to create this section manually.
Making sure that the site in the slot is able to respond with "200 OK" however could save several minutes since it allows the deployment engine to actually detect if the application initialization process has finished (the "X-AppInit-WarmingUp" header is only returned on succesful responses) instead of letting it try to validate this until it hits the timeout.
At the time of writing this blog post, the timeout values are as follows:
- If no warmup is detected (for example if the site just keeps responding with errors), it will wait for up to 10 minutes
- If warmup is detected for at least one instance, it will wait for a maximum time of 25 minutes
Why does this have to finish before I can swap the site?
As part of the swap process Azure also tries to validate the slot before the swap is made, these checks includes things like warmup and local cache to ensure that the new slot will behave as it's supposed to after the swap. If these checks fail, it will simply block the swap request so there isn't really any point in trying until the site is ready.
How can I know what took so much time in my latest deployment?
The details of the warmup process for each individual instance is logged in the detailed deployment log which can be accessed by first opening the output log of the deployment and then click the "Get Detailed Log"-button (see "Deployment job log output" in this article).
Is it wrong to keep it simple and just have one line of "/" in the applicationInitialization-section?
Tried that and getting:
"Failed to validate routing rule support. The error was: Failed to locate what hostname to use: Couldn't find any valid hostname by analyzing the applicationInitialization section from the slot slot"
@Johan, I wouldn't go as far as to say that it's wrong since it's valid from an IIS standpoint (not recommended though since the site most likely won't be properly warmed up using a configuration like that). The warning/error you're seeing is logged because we try to analyze that section to find out what hostname to use when validating the site and checking the warmup status (since most customers use the "builtin" one we provide, we know that the links and hostnames in this section works, and for those that add it "manually", they most likely put some thought into what they added here), but in this case, no hostname is specified at all.
If you want to keep a minimal warmup section for this site, I recommend that you simply add a valid hostname (used by that web app of course) using the hostName property, as it's done in the example here. The error/warning you're seeing is a "soft fail" though, we will use other methods of validating the site as fallback so the error in itself is not a big problem in any way.
However, if it's a production site, I highly recommend that you add a more robust warmup section (or simply remove it and let the deployment engine do it automatically) since having one request to the default page only will very likely cause issues for new instances. Even for a preproduction site, load tests will probably be a hit or miss during scale outs.
Normally, having a larger warmup section doesn't make the deployments or scale outs that much slower either, the first few requests might take a while but most of the time, the following requests will be very fast and provide something of a safety net in case something goes wrong with the first few requests (which happens fairly often for a number of reasons). And even if it takes some time, it's usally better that it's slow before receiving production traffic rather than being slow when it's receving it.
We've also made a few improvements to how it's handled automatically during deployments recently so might be worth trying that out again in case you haven't.
Not having to do anything should hopefully be even simpler :-)
OK! Is it recommended to acquire and add the slot hostnames as well for this config? The one I get e-mailed now is *-slot.azurewebsites.net but I saw the docs had another domain...
@Johan, The slot hostnames are not needed in this config, just use one of the hostnames that've been assigned to the actual web app in that environment. The "<webapp>.dxcloud.episerver.net"-address for example.
@Anders how the swap is done when there are N > 1 production instances?
Does it create N slots, which are replaced after warmup with all running instances? Or only one slot? - if so, how the rest of production instances are warmed up and swapped then?
@Aleksander - First, sorry for my late reply on this. We will warm up the same number of instances for the deployment slot as the production slot currently has and swap all of them at the same time to make sure the web app has enough resources to handle the current traffic load. Hope that answers your question, if not, please reach out again!
Thanks @Anders for your answer. I've one more question. After code is deployed to slot, autostaling is enabled. Let's assume that before swapping the slot, application scales out. How does the warmup works on a new instance? Does it warmup both - current and new application code at the same time?
That's correct @Aleksander, we enable autoscaling again after the deployment slot has been warmed up. If the site would then scale out before the "complete/go live" step is started, a new instance is started that holds both the current and the new version of code and both are warmed up before they recieve any traffic.
We will also validate the warmup status of the new instance before proceeding with the actual swap (and we will lock down the number of instances again during the complete/go live process so it doesn't scale out again until the swap has finished).
Thanks Anders for good article! We have been struggling with problems related DXC production deployments.
First issue is that even if we set "use maintenance" option, our changes models where not applied so the model sync did not happen. We are now in discussion with support to understand how the deployment process / Epi identifies whether the model sync is needed or not. Do you happen to have any insight to this?
Second thing is that our deployments also take awfully lot of time and customer is complaining about the down times. I would be really interested to know in which point of the deployment process site potentially stops answering to end users for a while? Both with maintenance page option and without it. The deployment log we see is really detailed but this thing I cannot figure out from the log. When we start showing maintenance page and when the site is really back online.
@Janne, You're welcome! Glad you found it useful! :-)
Regarding the first issue, I'm pretty sure I found the support ticket in question and it looks like it's been escalated to the team responsible for how the model sync works so I'll let them respond there (it's not the deployment process per se that identifies if the model needs to be updated or not, that logic is part of the CMS and runs when the app is started so they are definitely the experts in this area). Pretty sure you've already seen it but I just wanted to mention the documentation for it in case it can help.
The second part about deployment times, the "code copying" and slot preparation and verification of the configuration usually takes ~5 minutes or so. After that it's time to warmup the new slot and it's pretty common that the warmup process can take around 10 minutes to finish completely. While the primary/production slot should still be running and serving end users, it does utilize the same underlying resources (same app service plan) so if initilization is using up a lot of resources it could potentially affect production performance when the deployment slot starts up, not sure if that's what you've seen?
Regarding the timing of the maintenance page (and/or the potential performance degradation mentioned above), this happens after the code has been deployed and verified and it's time to startup the new slot (the point in time where the deployment process reaches 80% in DXC-S management portal). If the maintenance page option was selected, it's enabled just before the new slot starts up (to get a bit technical: the page is placed on a slot that is swapped with the production slot, which is then stopped), this is logged in the deployment log as "All sites in this deployment are now in maintenance mode". If the maintenance page options wasn't selected, the new slot is simply just started and gets prepared for "go live" (warmup).
The maintenance page would then be up until the site has been verified manually and the Go Live/Complete button is pressed in DXC-S management portal and the new slot is swapped (usually just takes a few minutes but we do check warmup and local cache again at this point in case the site scaled out, if it has, we will wait for the new instance(s) to become ready as well before the swap is performed).
Apologies in advance if I missunderstood the question/issue you were facing, feel free to reach out again in that case and I'll try to provide a better answer!
@Anders
In one of my projects, where we require users to be logged in to view the page content (no anonymous access), warmup and/or initialization hits always the root URL, which gives 403 error and it looks like it times out every time. I have created a dedicated warmup endpoint, configured appInitialization, but the whole warmup process didn't speed up at all. It hits my warmup endpoint, but it also hits the root URL and it times out always.
I have created a thread with more details, would appreciate if you could share some insight:
https://world.episerver.com/forum/developer-forum/-Episerver-75-CMS/Thread-Container/2020/9/dxp-deployment---warmup-always-times-out/
@Jerzy: Thanks for raising this issue with us. I've responded in the forum thread as well but to summarize I think adding that separate endpoint is a good approach that will make scale outs more efficient which is the most important part of this, for deployments however, we might need to make some changes on our side to make this particular scenario better/more effiecient.