This isn’t an EPiServer-specific post as such, but it is a relevant to most people developing with EPiServer because the chances are you will use a fair bit of LINQ – especially if you’re using things like the DDS. I love LINQ. It’s great. It enables you to take potentially complex looping and iterative coding and reduce it to easily readable LINQ statements. I use it primarily on collections of .Net objects, and the ability to do ‘deep’ property searching, matching and even arithmetic operations has led me to being a bit lazy and splashing LINQ around liberally. I even use it to avoid using EPiServer filters and FPWC if I already have a page data collection handy. However, nothing comes free and LINQ, like everything, has a cost. In the case of LINQ for collections, it is performance. (note that other LINQ implementations like LINQ to SQL are implemented entirely differently internally and this post does not apply to them – although you can use other techniques to improve performance for those queries too)
LINQ for collections itself is not inherently badly performing. It’s pretty slick. However, all these LINQ statements actually do is get ‘flattened out’ to code at compile time. (if looking at disassembled/profiled code, it’s those funny <something>b__x() methods you see everywhere that look like generics but aren’t) Internally, in the methods the compiler creates it iterates the collection and does whatever your query needs… ‘where’ clauses become equality checks etc.
At a simple level, LINQ may not perform any better when compared like-for-like to manually written code. In fact, it may even perform worse. Even more than that, it is important to think about what is actually happening. When you realise that you’re actually doing enumeration code, albeit in a clever wrapper, it makes you that much more aware of being careful with what you are doing. For example, let’s say you have a collection of ‘Foo’ objects and you want to find all objects that have a ‘bar’ property set to ‘A’, and all with ‘bar’ set to ‘B’. In LINQ, you could say this:
This looks nice and clean – we have our two enumerable collections populated on a single line each. If ‘fooCollection’ is small, then this is unlikely to be a problem. However, what this code is actually doing is enumerating the whole collection twice looking for each setting of ‘bar’. Even if the LINQ code performed as well as a loop for each item, we are doubling things up. It would therefore be much better to do something like:
This is more lines of code, but it’s only a single loop and we are populating both collections. The whole subject of optimising LINQ and loops goes much further, naturally. For example, it might be a good idea to break down your collections into Dictionary or SortedList collections if you have some kind of key that you use often. That way you can avoid doing a LINQ ‘where’ clause on the collection on every call just to find items for a specific key. For example, let’s say that each item in my collection of ‘Foo’ objects has a reference to a ‘bazID’ and I wanted to get I the sum of all ‘Value’ properties of my ‘Foo’ objects by their ‘bazID’. (this could typically happen if you have something like an order header with order lines). Using un-optimised LINQ I could write this:
What the code above will do is iterate my entire fooCollection for every iteration of the ‘baz’ loop looking for items with the right ‘bazID’, then sum the results. At this simple level, even though the entire collection is being repeatedly iterated that might still be a sensible option just for simplicity, but an alternative option would be to create an ‘indexed’ collection first, for example:
We now run through the whole fooCollection only once to create our ‘indexed’ collection, and after that only the appropriate parts will be iterated by LINQ. This is a fairly trivial example and of course you could achieve the same above using other simple operators and lambda expressions, but it shows the technique. If doing multiple checks against the fooByBazIdCollection then obviously the value of narrowing down the list that is being iterated increases rapidly.
Although nothing in this post is particularly new or complex, I hope that a reminder of what is happening behind the scenes with LINQ for collections might be of some use to you in writing optimal code.