One of our Epi (version 9.5) apps (load balanced) is experiencing an issue, where one of the web servers is getting into some sort of initialization loop. Last time around, in the space of 12 minutes, the InitializationModule were fired 23 times. Most of the time, during this incident a fair amount of RAM is released. However the app pool is not being recycled.
Anyone experienced simillar issue?
Add some logging to the module and see if it fails to run for some reason. I think episerver will try to run it again if it fails so that might be a reason...
But they shouldn't be running at all as the application is up and running.
Yup, but if the module throws an exception during initialize it might run again on next request...and so on...
But why it would have to initialize again if it has been runing for many hours?
It never ran the initialize method successfully...and will try again and again until it does.
Just guessing so turning on logging with log4net and adding some extra logging to initialize method of the failing module should be able to answer if my guess is correct or not.
Add a try catch to see if the initialize method throws any exception and an Log.Info at the end of initialize method to see that it actually gets there.
I know I ended up having a hundred or so event handlers for publish event because my module had an error in it somewhere and I didn't handle it correctly.
The site was running for a number of hours before it did go into this initialization loop. Just to be on the safe side, I will add logging to all my custom initialization modules.
Thank you very much for your help.
Might be related to load balancing as well. Might be that you get all traffic to a single server(?) and that the other server only starts after 10 mins or so when it gets its first request. Turning on logging will show that as well...
I think I wasn't clear enough when it comes to describing the problem. The issue is not present during the initial application start. It does happen very randomly and as far I as can see in the event log it is not related in any way to application pool recycle.
I understand. It is strange that this should happen after 10 mins. The only reasons I can think of are the above.
Did you ever get to the bottom of this. We are regularly experiencing this on a load balanced environment as well. The InitlializationEngine continues to loop many times and the site never becomes accessible before a manual IIS reset. I'm pretty sure that it is not because of heavy load to a single server, since it also happens at night when load is minimal.
I have added a link containing the contents of the Episerver log when initialization loop occurs:
Probably worth adding a support ticket. Only thing I see in logs that looks a bit weird is this
2016-11-01 03:24:53.337,INFO,EPiServer.Events.Providers.EventProviderService,Cancel sending event message as the EventProviderService doesn't have any configured providers.,
Since you have a load balanced site, the remote events needs to be configured to make it run properly. Have you configured those? Caching invalidation seems to work ok?
In order to troubleshoot try: http://www.epiwiki.se/tools/application-restart-detector
In our case setting fcnMode (https://msdn.microsoft.com/en-us/library/system.web.configuration.fcnmode%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396) to disabled did the trick.
Thanks for your follow up. I'm struggling to understand how fcnMode = disabled prevents the loop. When I look at our files on the website none of them changed at the recycle time. Which files did you find that actually changed at the recycle of the application pool? Was it modules?
I couldn't see any files being changed. However (as far as I can remember), the Application Restart Dectector was returning ConfigurationChange value.
Checking for application restart seems like a great first step...
If that is the case you can use a memory dump or similar after to find out exactly why something bugged...
We are having the same issue on Azure web apps. The application would be running, all of a sudden it will decide to replace an instance and then go into infinite loop. Azure support thinks that the cached objects are not properly released based on their dump analysis. How can we confirm that the cached objects are properly released?
I've seen it happen a few times. Every time has been a developer mistake where they have gotten a silly amount of objects and then cache them wrongly. Had one client who got the entire AD worth of users and cached that every 10 minutes causing memory to spike. Checking through the logic where you do caching it's almost always possible to build it in a smarter way that doesn't need to cache as much data. Using recursive GetChildren or similar in Episerver to crawl through the entire page tree unneccessary is not an uncommon mistake. If you cache stuff, avoid caching objects that have connections to plenty of other object (like PageData). Store only what you need in a separate flat object instead. Normally this will both result in a much smaller cache and make it easier for .NET to garbage collect when neccessary. And avoid using static dictionaries or similar to build your own cache. Then you are basically screwed when memory is running out. Do use Episervers caching system. Or at least .NET standard.