Subscription job deadlock on site with large number of users
On web sites with large number of users you may get a deadlock in the SQL-server when executing the EPiServer subscription job.
The error message is something like this: Exception has been thrown by the target of an invocation. [Transaction (Process ID 105) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.]
Cause of the error
The error comes from the SQL-server when the subscription job executes the method ProfileManager.GetAllProfiles(). From my investigations the error appears when users are registering to the site during this method is executed. On large sites this is a very common situation. Users do register all the time.
I have used Reflector to copy all code in EPiServer’s class SubscriptionJob to a new customized subscripttion job.
I made some changes to the code. First of all there are some code that could be optimized. In my application the execution time for the subscription job has gone from 30 to 6 minutes, by implementing this fix.
In the method protected virtual int SendSubscriptions(EPiServerProfile profile) there is no check if the user does subscribe to any pages or not. When you have large number of users and also a custom membership provider this will cause the job to run very slow. The PrincipalInfo of the users is fetched for all users. The method GetUser will be called for all users in the membership provider.
Some small changes to the code will fix this issue. Just check if the profile does subscribe to more than zero pages.
The solution to the real problem is more difficult. I have been able to get rid of many deadlocks but not all.
The profiles are fetched in blocks of 1000 in the subscription job. Between every 1000 some work is done for sending subscription emails to the users. This takes some time. For the next 1000 users it can be a deadlock. I think this occurs if a new user have registered during this time.
I tried to increase the pageSize for the methodProfileManager.GetAllProfiles() to decrease the time between the first and the last profile. The deadlocks are almost gone. Is there any reason why EPiServer do this job in page sizes of 1000? Is there a better solution to this problem?