Severely slow startup time on Azure

Vote:
 

Hi all.

First time here.

I'm not the most experience Episerver person around, and recently did a migration of our site from Azure Hosted VM's to Azure App Service.

A particular issue I need to try and solve right now is painfully long startup times after a deployment to App Service. One VM's it would typically take a minute or two, but one App Service, it's taking a much more painful 12 minutes or so.

Prior to go-live begin of last week, I had not noticed this behaviour - not on the deployment slot, nor the QA or staging slots. After doing a test on one of our QA slots afterwards, I did see the same 12 minutes once, but for most part, non-production have not shown this slow startup behaviour.

All slots created from same script, and all DB's are fairly similar coming from a not-too-old production DB.

Not entirely sure what details will be useful to people. Any suggestions on where to start looking for likely culprits would be most useful. 

Edit: 

Currently running version 11.12

#249047
Edited, Feb 23, 2021 11:03
valdis - Mar 01, 2021 14:05
Did you check diagnostics section for the App Service? What do you see in events logs? What about performance metrics? Pricing tier is the same?
Vote:
 

I've also noticed that for a customer. Haven't looked into it in detail though. Syncing content model towards db would be my first bet. There is a nice tool for checking timings of modules that run on start. Maybe that can point you in the right direction

https://github.com/episerver/DeveloperTools
or
https://www.david-tec.com/2015/02/episerver-debugging-tools/

Additional logging using log4net on info level or Application insights will also be useful for finding any performance bottleneck. 

Another thing to be aware of for Azure hosting is that DNS will update when using slots. If you are reusing HttpClient instances (which you should) that might give some other issues as well. The fix is to refresh the DNS after x seconds...

var sp = ServicePointManager.FindServicePoint(new Uri("http://epi.test/"));
sp.ConnectionLeaseTimeout = 60*1000; // 1 minute

Doesn't sound like that is the issue but worth mentioning

#249057
Edited, Feb 23, 2021 12:39
Chris - Feb 24, 2021 8:36
Thanks, will try and get those dev tools running on QA.

Logs currently aren't illuminating anything interesting.
Vote:
 

I think first of all you need to see which application layer have a problem.

once you deploy & on first load check Database activity and Web Application activity. 

Do you have any redirects/ or queue system in place for production?

Initialization modules could be an issue but you said the website is working fine in Dev & test environments. 

Do your warm-up your deployment slot?

The tool @Daniel mentioned seems useful but never used it. Also, if the problem is only with the first load then this add-on would not be any help. Normally Azure portal gives you a lot of information to investigate.

Look at SQL activity

Look at availability in Application insight

Look at Application Map insight Application insight

Look at performance insight Application insight

#249058
Feb 23, 2021 12:55
Chris - Feb 24, 2021 8:41
What I did notice last time was a large execution time on web, and on DB, a drop in cpu usage, but an increase in deadlocks.

Azure is currently recommending a couple of indexes be created - against some of the larger tables in Episerver. I a considering enabling those as Azure recommends, but am after some input from guys who helped set up the infrastructure on azure.

There may still be some other monitoring options not enabled for this site.

This is definitely a first load issues. Once it's running, it's snappy - far more responsive than it was on the VM's.
Vote:
 

Chris - Do one experiment - Upscale your SQL server and if the problem disappears then you need to optimize your database & apply indexes.  

For first-run application always takes time to run Initialization modules but 12 minutes is too much.

#249173
Feb 24, 2021 22:04
Daniel Ovaska - Feb 25, 2021 9:02
Yeah, slow SQL is usually what is the issue during startup. There are a LOT of SQL calls for the first requests before things are all cached and synchronized.
Chris - Mar 02, 2021 8:42
Hi.

Yes, I have applied a number of recommended indexes that azure recommended, which has subsequently helped on some issues I was experiencing, but having being pulled on to other higher impact issues for now, have not gotten to testing another deployment. Am hoping to tomorrow.

Just so understaffed finding time for anything is a nightmare :(
Vote:
 

Of the easy things to try, indexing shaved 50% off. 

Also, scaling up the site made a big difference too - reducing by another 50%. Whether that is because initial scale is too low, or there is some heavy hitting initialisation task going on, I don't know yet. 

So, those two changes had it approaching original levels (around 2 mins) but not quite all the way there, being more like 3 mins. 

But I have some off issue whereby those indexes have for some reason been reverted. I can't imagine that happened any way but automatically. Having a word with some systems people. 

Still severely pushed for time to try anything remotely more time consuming until mid/late next week. 

#249613
Mar 05, 2021 14:09
Vote:
 

Chris - good to know some solutions work for you.

When optimizing keep in mind the Initial load always takes a little more time than usual. That's why for Preproduction and production it is recommended to warm up the slot before swapping. 

#249615
Mar 05, 2021 14:32
Vote:
 

Turns out some of the indexes were reverted automatically because negatively impacted some queries. Will have to see it has more data on what exactly those queries were...

As for warming up slots, still need to wrap my head around that. I think from my previous reading it will only work for slot swapping - which, unfortunately, I still have to make some (a lot) changes for that to work :/ 

#249616
Mar 05, 2021 14:36
Vote:
 

Thought I would post an update.

So, set up a secondary "production" slot, and deploying to that and swapping has greatly reduced init time - less than a minute. And I know I can improve that. 

Time for that new slot to come up was around the original two minutes.

However, it feels like whichever slot is using the file contents of the original slot slower. I can track which is which by a particualr folder. The first time  swapped to production, it was fairly quick, swapping back was slower again. Then swapping again today, was much faster again.

Anyway, this is a temporary measure at best. I cannot leave it running due to licensing issues, so I will need to get my staging swappable with production. It just gives me some leeway, and I no longer need to get up extra early for deployments.

Also, the DB has finally settled too. 

#250250
Mar 16, 2021 8:48
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.