A shot in the dark but maybe this is with MARS enabled? When I was testing performance for CMS 12 MARS does not work well on Linux
Hi Johan, did you manage to resolve these errors? We're receiving the same error in DXP 😞
Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.
The timeout period elapsed prior to completion of the operation or the server is not responding.
Operation cancelled by user.
The statement has been terminated.
System.ComponentModel.Win32Exception (258): Unknown error 258
We still see them and usually at high volume in proximity of reported downtime. But there can also be tens of millions operations between showings.
Here is another issue with the client package that we are following since we usually see the problem around queries that are doing heavier work. Unfortunately they are not our own queries but if they were this issue thread suggest re-working those calls to be sync and not async.
Johan, what is the status of this now? 5.1.2 seems good still?
(BTW, as a side note, seems like 5.2.0-5.2.1 may have other issues, see Errors after upgrading to 5.2.0 from 5.1.5 on Linux · Issue #2378 · dotnet/SqlClient (github.com). 5.2.2 just released supposed to fix those.
Can someone help explain what Johan means by 'We rolled back from 5.2.* and have been on 5.1.5 over the summer and had good stability' means?
We are on EPiServer.Framework 12.21.6 and that has a dependency on Microsoft.Data.SqlClient (≥ 5.1.4)
So whatever project in my solution that has EPiServer.Framework is then using 5.1.4.
Are you saying you are manually adding the nuget package Microsoft.Data.SqlClient 5.1.5 to projects that have EPiServer.Framework?
Also, if I'm already on 5.1.4 and see these exceptions, is going to 5.1.5 really going to matter?
Can't recall what the status was between 5.1.4 and 5.1.5... But with the 5.2.0-5.2.1 error count went up and probably from a brand new error cause.
Errors like in my initial post are of course not only affected by the SqlClient-code but could also be turning up in states of port exhaustion for network calls or maxed out thread count.
The environment in question also had various upscaling done around the time of the 5.2.0-5.2.1 roll out, both number of cores, new type of app service tier and bumped up Azure SQL.
I guess after rolling back the upscaling in combination with the rollback to 5.1.5 gave us the good stability we see now.
I have this info endpoint added to see some of what the machine has set:
[Authorize(Roles = "CmsAdmins")]
public class EnvironmentInfoController : Microsoft.AspNetCore.Mvc.Controller
{
public Microsoft.AspNetCore.Mvc.ActionResult Index()
{
ThreadPool.GetMinThreads(out var minWorkerThreads, out var minCompletionPortThreads);
ThreadPool.GetMaxThreads(out var maxWorkerThreads, out var maxCompletionPortThreads);
ThreadPool.GetAvailableThreads(out var availableWorkerThreads, out var availableCompletionPortThreads);
var currentProcess = Process.GetCurrentProcess();
var sb = new StringBuilder();
if (currentProcess.Threads.Count > 0)
{
foreach (ProcessThread thread in currentProcess.Threads)
{
sb.AppendLine($"Start time: {thread.StartTime}, Total processor time: {thread.TotalProcessorTime}, Thread ID: {thread.Id}, State = {thread.ThreadState}");
}
}
var content = $"""
Instance name: {Environment.MachineName}
The number of processors on this instance is: {Environment.ProcessorCount}
Command line: {Environment.CommandLine}
Current process thread count: {currentProcess.Threads.Count}
Thread pool thread count: {ThreadPool.ThreadCount}
Thread pool min worker threads: {minWorkerThreads} / completion: {minCompletionPortThreads}
Thread pool max worker threads: {maxWorkerThreads} / completion: {maxCompletionPortThreads}
Thread pool available worker threads: {availableWorkerThreads} / completion: {availableCompletionPortThreads}
Thread info:
{sb}
""";
return this.Content(content, "text/plain");
}
}
Mike, you can force a different version of a transitive dependency either by adding it as a top level dependency, or by using central package management and transitive pinning.
Also: Are you using MARS (MultipleActiveResultSets) on your connection string by any chance?
Just saw this posted:
Fixed an issue by updating the Microsoft.Data.SqlClient dependency to 5.2.2.
https://world.optimizely.com/support/bug-list/bug/CMS-35781
Here's a simple KQL query you can run in AI Logs to summarize exceptions from the Azure SQL calls:
Looking into stability issues and when problems arise we see large amounts of these:
I'm interested if others are also seeing these two exceptions in some volume for their CMS 12 on DXP?
Here's an issue for SqlClient that suggests this may be a broader issue with dotnet core running in Linux containers in general:
https://github.com/dotnet/SqlClient/issues/1530
For the DXP this suggests some tweaks are already in place:
https://support.optimizely.com/hc/en-us/articles/4432366206733-CMS-12-site-crash-due-to-SQL-timeout-error-when-working-in-CMS-edit-mode