Don't let Regex slow you down
There is a joke with programmers/developers: you have one problem, you use regex, now you have two problems!
It's funny because it true! But kidding aside, let's all agree that Regex is incredibly powerful and helpful. But let's also agree that Regex can be expensive to run. We are told to cache the regex with RegexOptions.Compiled.
But is that a good enough practice? Or even "good"?
Today I stumped upon a bug fix I made 5 years ago. The context of bug fix was lost with time, so I think it could be interesting to see how much improvement it made. The fix moved a new Regex from instance to be static, so I asked CoPilot to write a test to compare between the three approaches
- Create a new Regex every time
- Create a new Regex every time but with RegexOptions.Compiled
- Create a new Regex once and cached with a static propery
And CoPilot created this bug free, error-free code. Scarily it even suggests to use Lazy<T> to delay the creation of Regex until necessarily.
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text.RegularExpressions;
[MemoryDiagnoser] // Enables memory allocation tracking
public class RegexBenchmark
{
private const string Input = "https://example.com/assets/image.png";
private const string Pattern = @"^https:\/\/example\.com\/assets\/.*$";
private static readonly Regex StaticRegex = new Regex(Pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
[Benchmark]
public bool NewRegex_Interpreted()
{
var regex = new Regex(Pattern, RegexOptions.IgnoreCase);
return regex.IsMatch(Input);
}
[Benchmark]
public bool NewRegex_Compiled()
{
var regex = new Regex(Pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
return regex.IsMatch(Input);
}
[Benchmark]
public bool StaticRegex_Compiled()
{
return StaticRegex.IsMatch(Input);
}
}
public class Program
{
public static void Main(string[] args)
{
BenchmarkRunner.Run<RegexBenchmark>();
}
}
And the test result was a surprise to me
I knew StaticRegex should be the best, but I was surprised to learn that even with caching and lookup, the Compiled regex is not only the slowest by a large margin, but also every expensive in term of allocations. It's a no no, especially in cases of hot paths! For inexperienced developers who think Compiled will solve the problem, it is, actually, creating a problem!
Do yourself a favor and look for all usages of new Regex in your code base. If it's not already a static property, time to be a hero and fix it! An alternative is to use Regex.IsMatch with Compiled option - so not creating an instance. It is slightly slower than Static Regex instance, but not by much and could be within the margin of error.
And if you have time to spare, take a look at .NET regular expression source generators - .NET | Microsoft Learn for even faster regex creations.
Nice and informative post for the developer life who deals with such scenarios.