Blog posts by Magnus Stråle

New campaign and promotion system in Episerver Commerce

2016-07-06T12:43:23.0000000Z

It's been in development and testing with beta customer for a while and with Commerce 9.19.0 it has finally dropped the beta stamp!

A terminology issue - in the user interface we use the term "discount" and in code / APIs we use the term "Promotions".

Why a new promotion system?

Discounts are one of the most effective ways to incentivize a shopper to complete the purchase. It's therefore a central feature in any e-commerce product and needs to be easy to use, flexible and performant.

The old promotion system in Episerver Commerce is based on the rules engine, a part of Windows Workflow Foundation v3, which has been obsoleted by Microsoft. It's also using a fairly advanced and complex UI to define the promotions which is not at the usability level that we strive for. These were the primary factors driving us to re-write the promotion system.

The idea is that we (Episerver) will provide most of the basic types of promotions that will be needed, and allow complete customization for those that have needs beyond what we provide out-of-the-box. For the marketer / merchandizer that will use those promotions we have a simple "fill in the blanks" metaphor.

What is it - exactly?

The main parts are:

New campaign concept
Code-centric definition of promotions
New UI for managing campaigns and discounts
Extensible promotion engine

Let's go thru them one-by-one, starting with the campaign concept. In the old promotion system campaigns were only used to hold dates and associate a promotion with the campaign. Initially the new campaign concept is little more than that but in the near future you will see a lot more benefits with the new campaigns, such as instant tracking of order value per campaign etc. Technically it's a content type in Episerver CMS and can be imported/exported in the same way as any normal content in CMS.

Promotions consists of two parts, a promotion data that holds metadata about the promotion such as which products/categories that the promotion applies to, required quantity, type and value of the actual discount that can be awarded etc. The second part is the promotion processor which implements the actual logic of checking if the promotion condition has been fulfilled, as well as returning information about the reward (if any) that should be applied.

We have tried really hard to make it as easy as possible to create new promotion types, with a lot of the hard work being done by the underlying promotion engine. Source code for the built-in promotion types will shortly be published on GitHub.

A lot of work has also gone into the new UI to make the process of creating and managing your campaigns and discounts a quick and intuitive process. Please let us know if we've succeeded.

Finally the promotion engine that drives the whole thing. The design goals were performance, simplicity and extensibility. You can find detailed information for developers here: http://world.episerver.com/documentation/Items/Developers-Guide/Episerver-Commerce/9/Marketing/marketing/

What's next?

Removing the beta stamp doesn't mean that we're done. We will add more promotion types, more features in the UI, improve performance, add capabilities etc. Removing the beta stamp is a signal that we will not make breaking changes, we're happy with the stability and general usability of this feature.

...and please - if you have feedback let us know!

There will be a Commerce 9 - which is a non-event

2015-04-15T15:50:52.7570000Z

An update to let you know that we are going to do a Commerce 9 release. With good results from Commerce 8 and CMS 8 releases I think everyone have a better idea of what major version releases from EPiServer means today, so this should not be a big thing. We will keep the breaking changes to a minimum so for many customers / partners this will not have any impact at all.

Why a major version release?

A technical reason to allow CMS Core to continue with improvements on our logging system (we expose data types from log4net in a public API - this needs to be removed = breaking change). So just to re-iterate what have been said in other channels - version numbers only carry a technical message.

Since we are going to do a breaking change release there are a few other things we're planning to do and please send us your feedback if you think it's good or bad.

Remove dependency on WWF (Windows Workflow Foundation), where WWF 3 that we are using is now obsoleted and (officially) not supported on Azure, We will provide a compatibility layer that allows customized workflow activities to be re-used with VERY MINOR modifications. NOTE! It will NOT be 100% source code compatible so dev resources are required for an upgrade. The positive aspect is that a number of partners have asked for / begged us to remove WWF for quite some time, so I expect this will not be such a negative thing after all.
Remove remains of Mediachase CMS. Most of the Mediachase CMS code, along with database tables, are still present in EPiServer Commerce. This is just unnecessary baggage that adds no value.
Implement improved version handling for catalog system. This is a feature that should be highly invisible, the only noticeable aspect would be improved performance, hopefully significantly improved performance. The actual feature itself might not be strictly breaking, but shipping it with Commerce 9 gives us a bit more flexibility.

When is this going to happen? No firm dates, but at least a few months into the future.

-- UPDATE --

We're getting closer to a release... The good news is that the performance has improved significantly - real-world tests has shown 2x - 10x performance improvements in catalog-related operations. As always, your mileage may vary.

Impressions from the SPLASH 2011 conference

2011-11-01T07:58:00.0000000Z

I have been to a number of IT-related conferences, but they have all been industry/practitioners events, usually the Microsoft PDC:s etc. These conferences are good and focused on delivering knowledge that you can take home and use today. However I really wanted to try out one of the more academic conferences and see what I could pick up there. The OOPSLA conference is something that I have heard of over the years so I decided that was probably a good bet (see http://splashcon.org/history/ for a couple of the "big things" that has originated there) . For various reasons it has changed name to SPLASH (a very convoluted acronym that stands for Systems, Programming, Languages and Applications: Software for Humanity). I soon found out that acronyms is something of an addiction in the academic world, I quickly got lost among FOOL, PLATEAU, FREECO, AGERE etc, etc...

Since this conference covered so much, I will not try to cover everything but just give a very broad overview and point out some points of interest.

Theme 1 - JavaScript rules!

Much research with regards to programming languages was focused on JavaScript since it is such a popular language. This was a bit of a surprise to me, I expected more work around functional languages.

The keynote speech for the last day was given by Brendan Eich, the original creator of JavaScript. An interesting talk about the history of the language (it was prototyped in 10 days before the spec was frozen) as well as some information about the current and future developments with EcmaScript 5 and 6. JavaScript will move away from dynamic scoping and when ES6 comes along it should (hopefully) only be lexical scoping. One of the demos was really cool - a video decoder entirely in JavaScript and running at 30fps! See http://arstechnica.com/open-source/news/2011/10/native-javascript-h264-decoder-offers-compelling-demo-of-js-performance.ars for more info.

Theme 2 - parallelism, concurrency, multi-core, many-core...

I cannot remember how many times I heard "The free lunch is over" referring to Herb Sutters classic article http://www.gotw.ca/publications/concurrency-ddj.htm . Everyone was experimenting with various approaches for taking advantage of many-core architectures, improving the parallelism of serial programs, improving synchronization mechanisms etc.

Speculative execution "in the large": You know about todays CPU:s using speculative execution to improve performance, but now there are experiments with using many-core architectures to automatically execute various permutations of your code in parallell and simply pick the version that finishes first.

Another thing that was really interesting was a concept that David Ungar from IBM talked about. Forget about trying to ensure the correctness of your code with synchronization - it is just too expensive. Instead accept that your code will be wrong some times and try to implement recovery strategies. I am a proponent of eventual consistency for the sake of performance, but giving up on the notion of correctness? Maybe in a few, well defined cases, yes - but I have a hard time convincing myself that correctness can be a secondary concern.

Theme 3 - Java and the JVM

Maybe not as much of a theme as the other two, but almost every time there was a talk about introducing new language features or modifying runtime behavior, the JVM (and sometimes Java) was the subject being experimented on. For example one talk was about using asynchronous assertions - basically breaking out asserts in your Java program and running them on one or more separate threads. Implemented by doing some brain-surgery on the JVM.

I talked to a couple of the presenters and it seems to be the general view that the current JVM implementations are really outstanding examples of good software engineering. However the Java language was generally not held in high regard.

Academic conference indeed...

The event started out on Sunday with a number of workshops. I decided to go to the FOOL (Foundations of Object Oriented Languages) workshop and quickly found out that this was really the stuff for die-hard mathematicians. Mathematical descriptions for JavaScripts memory/variable model, more mathematical reasoning about events in combination with object state changes etc. Somewhat overwhelming and intimidating but the highlights was to get to talk with some really bright people. Even though highly research-oriented and academic in nature, everyone seemed really interested in applying their knowledge to real-world problems. The researchers really want to work with us practicioners (that's us - the people who making a living of creating software).

...but also a lot of pragmatic views

My favourite talks:

"How to handle 1 000 000 daily users without using a cache" by Jesper Richter-Reichhelm from WOOGA
Described the challenging road from a traditional three-layered database approach, to doing both horizontal and vertical partitioning of data, to moving to an hybrid solution with in-memory NoSql as well as traditional SQL database. All in the name of performance and scalability.
"Erlang - The Road Movie" by Kresten Krab Thorup from Trifork
Contrasting the object-oriented view that "everyone" in the software industry has today with the agent-approach of Erlang, this was a real eye-opener for me. The object-based paradigm is so dominant that it is hard to even see that there are alternatives. Erlang is the next language that I will learn. Period.

Sorry - I could not find the slides for these presentations available online...

[Update - slides for "Erlang the Road Movie" available here http://www.slideshare.net/drkrab/erlang-the-road-movie-gotocph-2011-keynote ]

CSS styles in the editor

2011-10-04T04:25:06.0000000Z

First I want to point out that I am not a designer in any way (ask some EPiServer old-timer about the "Magnus Login dialog"...) and I have limited experience in working with CSS. In other words, this blog post is way outside my usual comfort zone. However I recently did add support for using "site identic" styling in the editor and discovered a way of doing this without duplicating CSS information or adding attributes that causes CSS not to validate.

Start by reading this blog post by Marek Blotny http://marekblotny.blogspot.com/2009/05/how-to-define-custom-styles-in.html

The problem is that you either have to have non-standard attributes (EditMenuName) in your CSS files or you have to duplicate the CSS content in your own editor css file.

The approach I came up with is to use an often-overlooked feature of the uiEditorCssPaths setting. Note that the name ends with an "s"... Aha, so you can have more than one CSS file listed here!

The common "best practice" is to have a reset.css (should be a separate file) and then the actual css styling, let's call it Common.css, that you want to use. See http://sixrevisions.com/css/css-tips/css-tip-1-resetting-your-styles-with-css-reset/ or Google for css reset best practice. Now you simply create an editor.css that only references the tags and / or styles that you want to make available for the editors by setting the EditMenuName attribute.

A sample editor.css:

h1 { EditMenuName: Header 1; }
h2 { EditMenuName: Header 2; }
h3 { EditMenuName: Header 3; }
.enoLink { EditMenuName: Link; }
.enoClearFix { EditMenuName: Clear Fix; }

Now set the uiEditorCssPaths in web.config / episerver.config:

uiEditorCssPaths="~/Templates/Styles/Reset.css, ~/Templates/Styles/Common.css, ~/Templates/Styles/Editor.css"

Voilà - your selected tags and styles are available in the editor without breaking CSS validation or introducing duplication.

Do we really need yet another HTML parser?

2011-03-03T07:50:00.0000000Z

In EPiServer Framework we now (since the Community 4 release) ship a new HTML parser. I will do some more technically oriented posts later on, but for now I just wanted to explain why we decided to invest in a new parser.

Why do we need a HTML parser at all?

Short aside - the "HTML parsing" that we do is lexical analysis and tokenization. In a Computer Science class we would not get away with calling this parsing since we don't really care about the syntax.

In EPiServer CMS we primarily need HTML parsing for Friendly URL (aka FURL) rewriting of outgoing HTML. It is also used to deal with the permanent link scheme used when storing CMS data to the database. As part of this process we also do the "soft link" indexing. There are also a few other situations, for example allowing a subset of HTML for untrusted user input (Relate), where a solid HTML parser can help. Finally there are also all sorts of interesting scenarios that you, our partners, come up with - pulling information from other sites and extracting links, custom markup language etc.

What's wrong with the SGML Reader that we use today?

The SGML Reader is basically an XML reader (build around the same .NET infrastructure as the XML readers) that accepts malformed XML / HTML.

Unfortunately that codebase is very complex . There are a couple of long-standing bugs that we have been unable to fix. Another aspect is that the SGML Reader will force your HTML code into well-formed XHTML. This is usually the right thing to do, but in some cases you don't want your HTML code to be reformatted at all.

The XML reader model with returning the node and attributes separately also causes client code to be much more complicated than necessary, usually forcing you to use an event-based architecture. I will give some examples of this in future posts.

We are not happy with the SGML Reader, but there are other HTML parsers. Why not use one of them?

Very good question - it is so easy to fall into the "Not Invented Here" trap. Creating a good HTML parser is a major undertaking so lets first lets go thru the "must haves" for our parser and compare it to existing HTML parser implementations:

High performance.
Since it is used to parse outgoing HTML it will be called very frequently. Today the FURL rewriting is responsible for 5 - 20% of the page execution time.
Streaming model.
Since it is used very frequently and with HTML responses of unknown size, it would be bad if we have to keep the entire HTML in memory at the same time. We need a streaming model to handle this.
Easy to maintain and easy to use.
This is a must for any piece of software, but I mention it here due to the issues we've had with SGML Parser.
Minimal changes to HTML after roundtrip in the parser.
If you write HTML in a specific way, then you probably do it for a reason. We should not modify it unless requested, or absolutely necessary.
Pure CLR implementation.
We do not want complicated installation procedures with COM registration or native-code libraries.

Now lets take a look at the existing parsers:

SGML Reader http://archive.msdn.microsoft.com/SgmlReader	XML DOM based-parsing, although possible to use without actually creating an XML DOM. As previously noted, fails on #3 and #4.
HTML Agility Pack http://htmlagilitypack.codeplex.com/	Very nice API, but it does not support a streaming model. Everything gets read into memory before you can act on it, breaking #2.
Majestic-12 http://www.majestic12.co.uk/projects/html_parser.php	Extremely fast but enforces a hard, compile-time limit on the size of the HTML. The API is also a bit clunky, breaking must-have #2 and #3.
LINQ to HTML http://www.justagile.com/linq-to-html.aspx	Nice API, but still DOM based breaking #2.
TidyForNet http://tidyfornet.sourceforge.net/	A wrapper for native-code HTML Tidy library, breaking #5

(Please let me know if there are any interesting libraries that I have missed.)

What are the features that makes the new HTML parser so special?

Streaming model.
The parser simply returns a stream of HTML fragments, no DOM, no big pile of data in memory.
High-performance.
Detailed benchmarks has only been done against SGML Reader, which is already fast, and HtmlStreamReader outperforms SGML Reader by 10 - 50%.
LINQ support.
Since the HtmlStreamReader implements IEnumerable<HtmlFragment> it directly supports LINQ-to-objects. Alternatively you can just do a foreach over the results if you want to do classical looping.
Roundtripping (read from a stream and output data to another stream) thru HtmlStreamReader will do minimal changes to your code.
The only things that we will touch are whitespaces in HTML elements, everything else is left intact, unless you explicitly enable things like fixing mismatched tags etc.
Support for correcting common issues with HTML.
The parser can automatically insert missing end tags, enforce the empty content model and a few other tricks to clean up your HTML. All these fixups are optional.
Handling of malformed data is compatible with common browser behavior.
If you have serious errors in your HTML code, such as leaving out the closing bracket ( "<b>Bold</b<i>Italic</i>" ) the parser will correct and return elements according to the same heuristics as most major browsers.

None of these features in itself makes the HtmlStreamReader unique, but the combination of features are perfect for our needs. I hope you will find it useful too!

Since I feel that a blog post is not complete unless it contains at least a few lines of code, here is a short sample showing off the LINQ capabilities. This code snippet will show all external references from the startpage of Swedish newspaper DN.

var html = new HtmlStreamReader(new StreamReader(WebRequest.Create("http://www.dn.se/").GetResponse().GetResponseStream()));

var result = html.

OfType<ElementFragment>().

SelectMany(e => e.Attributes).

Where(a => (a.Token == AttributeToken.Src || a.Token == AttributeToken.Href) &&

!a.UnquotedValue.StartsWith("http://www.dn.se") &&

a.UnquotedValue.StartsWith("http://"));

Dynamic property performance improvements

2010-10-20T10:53:06.0000000Z

I won't give you a background on dynamic properties here. If you don't know about them, this blog post will probably not make much sense to you anyway.

The bad

Dynamic properties in EPiServer CMS has a bad reputation for killing performance. The actual problem is not really when accessing the properties, it is an old design decision that optimizes the "read dynamic property from cache" part. You could probably see this as an effect of a premature optimization where the big picture was not clear before the decision was made.

Getting a dynamic property from cache is extremely fast - yes you do have an overhead compared to "regular" properties, but comparing this to the overall execution time for a page it is close to negligible.

Why is it a bad thing to optimize heavily to what you would regard as the most common use-case for dynamic properties (reading them for a page)?

Well, in order for the "read from cache" to be as fast as possible, we store a short-circuited tree structure in the cache with nodes directly linking only to pages where we have an actual dynamic property definition. This tree needs quite a lot of work to be setup and most of this work is done in the database procedure that gets the data. I e when the cached information is flushed, performance gets bad.

Since we store a partial representation of the tree structure it needs to be flushed as soon as you move or delete a page and that is the core of the dynamic property performance problems.

The good

It's fixed. Yes, it is really fixed now. We no longer flush the cache on move, the actual operation of reading dynamic properties from the database is fast and you should no longer see any bad side-effects of using dynamic properties.

We de-optimized the "read from cache" scenario and that allowed us to simplify the code and data structures. Instead of doing a quick lookup in a hash table, we do a standard "follow the parent chain" loop in order to find pages with dynamic property definitions. This makes the read access for dynamic properties slower than it used to be, but it is still in the <1% range of total page execution time.

The nice side-effect

There is a related optimization that was done, which you may want to use in your code. The default DataFactory.GetPage() API will use the default LanguageSelector implementation, which contains quite a lot of logic for dealing with language falllbacks etc. In some cases you don't really care about the language specific information, in the dynamic property case we just want to iterate over the the parent chain. We therefore have a new NullLanguageSelector which simply uses whatever language is present in the returned PageData object.

To use it yourself you can write code like this:

var page = DataFactory.Instance.GetPage(pageLink, NullLanguageSelector.Instance);

If you make a lot of GetPage calls that uses non-language specific information you may see a fair performance improvement in the range of 5 - 15% for this specfic API.

When can I see the new dynamic property goodness?

In EPiServer CMS 6 R2.

Debugging initialization modules

2010-08-12T16:25:00.0000000Z

I have covered the new initialization system as introduced with EPiServer CMS 6 and EPiServer Community 4 in previous blog posts. See http://world.episerver.com/Blogs/Magnus-Strale/Dates/2009/12/The-new-initialization-system---How-to-write-an-initialization-module-and-advanced-topics/ Just back from a long summer vacation, one of the first tasks was related to a new initialization module and I just wanted to do a quick “debug exploration” before starting on the work proper.

However I utterly failed to get the debugger to halt on a breakpoint in the initialization module. After a coffee I realized that the problem was that the initialization modules execute before the debugger actually hooks up to the process due to the initialization code being invoked from the HttpApplication constructor (actually the EPiServer.Global constructor).

However there is a simple workaround:

Open the EPiServerFramework.config file.
Remove the <automaticSiteMapping> section.
Save the file.
Set your breakpoints in VisualStudio.
Start your EPiServer web app from VisualStudio.

You will now get the debugger to break, since the initialization system will now abort on the first invocation and delay execution until the first BeginRequest, and at that stage the debugger is attached and functional.

Note that the file EPiServerFramework.config will be automatically updated so you might want to set it to read-only to avoid having to edit it constantly.

Page Type Builder and EPiServer CMS

2010-06-22T14:22:39.0000000Z

Page Type Builder (a k a PTB) by Joel Abrahamsson is one of the most popular Open Source projects for EPiServer CMS developers. It helps increase developer productivity when creating EPiServer solutions. See http://pagetypebuilder.codeplex.com/ for information and links.

A History Lesson

Let's go back to EPiServer 3. It was built on ASP with a few COM objects thrown in for good measure. ASP used VBScript as the programming language and VBScript is not exactly a strongly typed language (remember VARIANTs?). To make the data for a page available to the ASP template developer we used a Scripting.Dictionary - basically the same as a Dictionary in .NET.

When we started to develop EPiServer 4, moving to the .NET platform, we needed to remain fairly compatible with the page template development model in EPiServer 3, but still benefit from the .NET framework. PageData was the perfect middle-ground: the "well known" data could be accessed in a strongly-typed manner and the data defined at runtime was accessed with untyped "dictionary syntax".

Looking at the projects done with EPiServer 3 there was usually a need to start to feed the site with information before the development of page templates was done. By allowing somebody without programming skills to define the page type and having that as the basis for what we display in edit mode, it was possible for users to immediately start entering information into the site.

Page Type Builder - A Perfect Complement

PTB is an excellent solution for those people that want a strongly-typed model for their page template development as well as circumventing the issue of synchronizing changes to the page type definition.

This means that you can have the best of two worlds - the "EPiServer classic" approach or the "code only" PTB approach. The choice is yours!

Why don't EPiServer simply take/buy PTB and provide it out-of-the-box? As good as PTB is, it is not the ultimate solution but a way to add features that are complementary to the original design decisions.

We will gladly help Joel and anyone using PTB to try to ensure that it works as well as possible, but we will not do PTB ourselves, nor can we guarantee that PTB will work with all EPiServer modules (although we will certainly try).

The Future

EPiServer CMS will provide a way to support strongly typed models. EPiServer CMS will probably move towards a more declarative approach for working with page types, reducing the need to work with a page type UI.

I will not make any promises as to when this will happen. The purpose of this blog post is to give the official EPiServer view of PTB, not to spill the beans on our development plans.

Cookieless Session Support in EPiServer

2010-06-22T08:56:16.0000000Z

No.

As in "No - we don't support cookieless sessions when using Friendly URLs". Your natural reaction is then probably to ask "Why not?". The basic design guidelines for our software is that we should build upon and extend the .NET / ASP.NET framework and that would imply that we should support cookieless sessions (CS from here on).

If you have never heard of CS before you probably want to check this article for some background information. http://msdn.microsoft.com/en-us/library/aa479314.aspx

As hinted in the article, the implementation of CS in ASP.NET is somewhat fragile in itself. There are a couple of guidelines that you have to follow in order for your site to work properly with CS. Basically you need to make sure that all yours that you put in the HTML response are relative or you need to call a special method (HttpResponse.ApplyAppPathModifier) to include the CS URL segment.

Another aspect of CS that is far from ideal is the fact that it is extremely easy to hijack an existing session - you simply need to copy the URL. Cookies are here to stay and my personal opinion is that CS is a leftover from the old "cookies are evil" debate.

Back to the techie stuff - why does this not work with FURL? All links that we generate from permanent links could be adapted to call ApplyAppPathModifier if we detect that CS is enabled. Unfortunately there is another thing that will break the solution. If you generate links in the standard ASP.NET way and that involves the System.Web.UI.Control.ResolveUrl method, you will implicitly call ApplyAppPathModifier and write the CS URL segment.

This is bad since there is no way for the FURL module to reliably detect the CS URL segment in a URL (short of doing nasty reflection tricks) and we need to do that in order to properly rebase all URLs when we do FURL rewriting.

Finally there are a lot of complications when dealing with CS links from JavaScript (which we do a lot from edit mode) which would require a significant investment to update and fully support CS.

To summarize: In order to support the somewhat brittle CS system we would need to introduce even more restrictions. Since there has been practically no requests for EPiServer to support CS we have decided to say "CS is unsupported" for the time being.

Changes in the initialization system from EPiServer CMS 6 RC1

2010-02-11T07:43:00.0000000Z

For some background you may want to take a look at http://world.episerver.com/Blogs/Magnus-Strale/Dates/2009/12/The-new-initialization-system---How-to-write-an-initialization-module-and-advanced-topics/ and especially the feedback which was entirely correct.

The presence of an IsInitialized property that should not be touched by the initialization module that you write is counter-intuitive. I also saw some internal confusion as to how the IsInitialized property should be handled and we have therefore decided to remove it and move the state handling into the initialization engine instead.

The blog post referred to above will be edited shortly to reflect the changes, which are:

The IsInitialized property has been removed.
There is an IsInitialized method on InitializationEngive that takes an IInitializableModule and returns true if that module has been properly initialized.

Note that this is a breaking change from EPiServer CMS 6 RC1! At the very least you will need to recompile your code if you have written your own initialization module. It is also highly recommended that you remove the IsInitialized property from your module implementation.

The truth is out there…

2010-02-08T08:56:00.0000000Z

I have just spent a couple of late evenings hacking away on a piece of performance critical code and I have naturally been profiling it extensively. Here are a few lessons learned that might help some of you doing the same.

Running code under a profiler will affect the performance characteristics
In most cases a profiler will give a good indication of your performance bottlenecks, but there are occasions when it will point you in the wrong direction. My biggest mistake was to spend 4+ hours rewriting a small (10 lines) routine over and over again since dotTrace (running in Tracing profiling mode) indicated that it took > 20% of the execution time. It seems as if the Tracing profiling mode in dotTrace will skew the results heavily when making frequent calls to small routines. When running the same code in Sampling mode my small method didn't even show up. Doing "custom profiling" with the Stopwatch class also showed the same result as with Sampling mode.
[UPDATE! using the EAP version of dotTrace 4 and running with CPU Instruction tracing the Tracing mode worked a lot better - comparing it to Sampling mode the difference was usually in the range 1 - 2 % which is quite acceptable]
Use real-life data when measuring performance
This is a no-brainer, but I was running with highly artificial data for a long time since I was "just in early development stages - will fix it later"-mode.
Don't wait too long with performance testing
Use performance testing early in the project to evaluate the efficiency using various approaches to solving the problem at hand. Doing a complete rewrite after you have a nice, working (but too slow) solution is not fun.
Don't optimize too soon
This may seem contrary to the previous paragraph, but this is just trying to point out that early performance testing should not make you focus on fine-tuning a few methods, but keep your eyes on the big picture. Keep your code as clean as possible for as long as possible.

I have personally found that "Optimize too soon" is usually where I fail. The simple reason is that it is so much fun comparing measurements and see that you have just cut another 2% in execution time...

Finally a good source of inspiration is Michael Abrash series of articles on optimizing the Quake engine. http://www.bluesnews.com/abrash/

The new initialization system - How to write an initialization module and advanced topics

2009-12-21T10:32:00.0000000Z

Here is the third part of the blog series about initialization in EPiServer CMS. For background you may want to read http://world.episerver.com/Blogs/Magnus-Strale/Dates/2009/12/Initialization-in-EPiServer-CMS-5/ and http://world.episerver.com/Blogs/Magnus-Strale/Dates/2009/12/Introducing-the-new-Initialization-System-in-EPiServer-CMS-6/

[NOTE - this article has been edited to show changes introduced after RC1. Strikethrough is used to indicate things no longer relevant after RC1]

Writing your own initialization module

This should usually be a very straight-forward process. I will start by showing two different examples, one very simple and one more complex. The code shows most of the recommendations outlined later in this post. To start with the very simple initialization (actual code from EPiServer.Data assembly):

[InitializableModule] 
public class DataInitialization : IInitializableModule 
{ 
    public static DataInitialization Instance 
    { 
        get; 
        private set; 
    } 
    #region IInitializableModule Members 
    public void Initialize(InitializationEngine context) 
    { 
        DynamicDataStoreFactory.Instance = new EPiServerDynamicDataStoreFactory(); 
        Instance = this; 
    } 
    public void Uninitialize(InitializationEngine context) 
    { 
        Instance = null; 
        DynamicDataStoreFactory.Instance = null; 
    } 
    public bool IsInitialized 
    { 
        get; 
        set; 
    } 
    public void Preload(string[] parameters) 
    { 
        throw new NotImplementedException(); 
    } 
    #endregion 
}

In the code above we just set up a property with a default implementation in Initialize and undo the process in Uninitialize.

A somewhat more complex example (code snippet from the EPiServer.Framework assembly):

[InitializableModule] 

public class SiteMappingConfiguration : IInitializableModule 
{ 
     // 
     // Code deleted to focus on the initialization... 
     // 

     #region IInitializableModule Members 

     public void Initialize(InitializationEngine context) 
     { 
         var section = EPiServerFrameworkSection.Instance; 
         InitializeFromConfig(section.SiteHostMapping); 
         var configSiteId = SiteIdFromConfig(section); // Get info from config file / 

         var actualSiteId = SiteIdFromRequest();         // Get info from HttpContext / Request / ServerVariables / host 

         // If both actual & config siteId are null, then we cannot determine siteId at this point in time 
         if (actualSiteId == null && configSiteId == null) 
         { 
             throw new TerminateInitializationException("Cannot determine siteId at this time - wait for BeginRequest"); 
         } 

         // If actual is null, that means we have info in configuration - just use it 
         if (actualSiteId == null) 
         { 
             SiteId = configSiteId; 
         } 
         // This clause handles the case when both config and actual has information, use actual and update config if different 
         else if (actualSiteId != configSiteId) 
         { 
             SaveSiteIdInConfig(actualSiteId); 
             SiteId = actualSiteId; 
         } 
         // Both actual and configured siteId are the same, just use it 
         else 
         { 
             SiteId = actualSiteId; 
         } 

         Instance = this; 
     } 
     public void Uninitialize(InitializationEngine context) 
     { 
         Instance = null; 
         SiteId = null; 
         _hostNameToSiteLanguage = null; 
         _portWildcardExists = false; 
         _hostWildcardExists = false; 
     } 

     public bool IsInitialized 
     { 
         get; 
         set; 
     } 

     public void Preload(string[] parameters) 
     { 
         throw new NotImplementedException(); 
     } 

     #endregion

The code above makes use of the TerminateInitializationException to postpone the rest of the initialization until we reach the FirstBeginRequest event. At that point the entire Initialize method will be re-executed.

Recommendations for your Initialization Module

Allow for Initialize to be called multiple times
If you do multi-step initialization in your Initialize method, make sure that it will do the correct thing if it is re-executed because of an exception. A very simple example: private bool _eventAttached; public void Initialize(InitializationEngine context) { if (!_eventAttached) { SomeClass.AnEvent += MyEventHandler; _eventAttached = true; } MethodThatMayThrowException(); }
This Initialize method may throw an exception after the event handler has been hooked up. The initialization system will re-invoke the Initialize method on next request that reaches the web application and if we did not protect the event hook-up with a flag it would get added again.

The initialization engine will make sure that your code executes in a single-threaded manner.
No need for you to lock regions when dealing with shared state etc. Note that this guarantee is only made for Initialize / Unintialize when executing thru the initialization system. If you have custom code that makes any calls directly into your initialization module then you may need to deal with multi-threading issues.

Don't modify the IsInitialized property
Let the initialization system set the value of this property rather than modifying it yourself. Otherwise you may cause the initialization system to malfunction.

Expose your initialization module with a static Instance property.
A convention that I suggest you follow is to create a static Instance property that simply exposes the current instance of you initialization module. This gives your code an easy way to access the initialization module, otherwise it will be hard to get at since it is created from the initialization engine.
Do a full implementation of Uninitialize
Anything done by Initialize should be undone by Uninitialize. It should also be un-done in the reverse order of Initialize.
Do not add logic to the Preload method
If you let Visual Studio implement the IInitializableModule interface the Preload method will be generated as "throw new NotImplementedException();". Please leave it like that. The Preload method has been added to support the "Always running" concept in ASP.NET 4 (only works with IIS 7.5 and later), but since there is currently no way of testing this code (it is never called by the initialization system in EPiServer CMS 6) you should not attempt to implement it. Note that it looks like we break this recommendation with our first example of an initialization module. In the EPiServer.Data initialization module the only thing we do on Initialize is to assign

Assembly scanning and filtering

When the initialization system looks for modules it relies on MEF to handle loading and composition. The basic idea is to scan all assemblies that are part of the application, with the exception of .NET Framework assemblies. This assembly list is what is exposed thru the EPiServer.Framework.Initialization.InitializationModule.Assemblies static property.

The EPiServer CMS plugin system has been updated to use the same list of assemblies as the initialization system. This is important to remember when you decide which assemblies should be scanned by EPiServer Framework.

Built into the EPiServer Framework we have a filtering mechanism that is based on two different attributes, the [PreventAssembyScan] attribute and the [AllowAssemblyScan("Product")] attribute, as well as a configuration section.

To start with the simple part it is quite obvious what [PreventAssemblyScan] will do. It is recommended that you add this attribute to your assembly if it does not contain any initialization modules nor any EPiServer CMS plugins. This attribute will help improve the startup time of your web application since scanning a large assembly is an expensive process.

Another option is to use the [AllowAssemblyScan("Product")] attribute. It is a bit more complicated since the purpose is also to exclude assemblies from the scanning process when possible, which may sound contrary to the name of the attribute.

An example should hopefully make it clear:

You create an assembly MyCmsPlugins.dll that contains EPIServer CMS plugins. You want to avoid having EPiServer Community scan the assembly for performance reasons. Add the attribute [AllowAssemblyScan("CMS")] to MyCmsPlugin.dll. This will include the assembly when scanned by EPiServer CMS, but not when scanned by EPiServer Community.

Assuming that the assembly gets the addition of a few Community extensions, but you still want to limit scanning to only be done by EPiServer CMS and EPiServer Community, simply add the attribute[AllowAssemblyScan("Community")] in addition to [AllowAssemblyScan("CMS")].

Currently we only define the two strings "CMS" and "Community" to be used with [AllowAssemblyScan]. The "Community" part will be used by EPiServer Community 4 and "CMS" is used by EPiServer CMS 6.

Note that scanning for initialization modules is done regardless of the [AllowAssemblyScan] attribute, it is only honoured by product specific features and EPiServer Framework is cross-product.

Customizing assembly scanning with configuration

EPiServer is trying to reduce the amount of configuration as much as possible, but there are still optional configuration settings that you can use to customize the assembly scanning process. This is placed in the EPiServerFramework.config file. Note that the default configuration (see <configuration> / <configSections> in web.config) sets the restartOnExternalChanges attribute to false. I e changes to this file will not restart your web application.

In EPiServerFramework.config you will find a section:


<scanAssembly forceBinFolderScan="true" />

This section can be used to customize the assembly scanning process. It should be regarded as an additional filter on top of the assemblies normally loaded by ASP.NET as controlled by the <system.web> / <compilation> / <assemblies> section. Note that the bulk of the configuration usually resides in the systems web.config file.

If you want to exclude some specific assemblies from the normal scanning process as described in the previous section do the following:

<scanAssembly forceBinFolderScan="true> 
   <add assembly="*" /> 
   <remove assembly="MyAssembly" /> 
   <remove assembly="MyAssembly2" /> 
</scanAssembly>

This will include all assemblies by virtue of the <add assembly="*" /> directive (except those filtered by attributes as described above) except for MyAssembly and MyAssembly2. The second mode of usage is to only scan specific assemblies by adding configuration similar to this:

<scanAssembly forceBinFolderScan="true> 
  <add assembly="EPiServer.Framework" /> 
  <add assembly="EPiServer.Data" /> 
  <add assembly="EPiServer.Events" /> 
  <add assembly="EPiServer.Shell" /> 
  <add assembly="EPiServer" /> 
  <add assembly="EPiServer.Enterprise" /> 
  <add assembly="EPiServer.Cms.Shell.UI" /> 
</scanAssembly>

This will exclude any other assemblies from being scanned. Note that the selection of assemblies above represent all assemblies delivered with EPiServer CMS 6 that has an initialization modules. I e these assemblies must be present for EPiServer CMS 6 to work properly.

The InitComplete event

There are cases where you might want you initialization module to be called again after the initialization process is complete. A typical use case (borrowed from EPiServer Community 4) is attaching event handlers to an instance property that may be overridden by third party code.

To attach to the InitComplete event you could write your Initialize method like this:


public void Initialize(InitializationEngine context) 
{ 
    context.InitComplete += InitCompleteHandler; 
    StartThisModule(); 
}

When all initialization modules have executed the InitComplete event is raised. Note that the InitComplete event will be handled in a slightly non-standard way. When an event handler has executed without throwing an exception, the initialization system will remove it from the InitComplete event. This means that you should not detach from the InitComplete event in your Uninitialize method.

Why does the initialization system do such a strange thing? Simply to make sure that if an InitComplete event handler throws an exception, we can re-execute the InitComplete event on next request without re-invoking the already successfully executed event handlers.

Summary

This should hopefully explain most of the features of our new initialization system as well as how to use it. Please send feedback both on the feature itself and if there is anything in the blog posts that is not clear and I will try to fill in the gaps.

I hope that with this new system in place we can provide an even better platform for you to build and improve upon!

Introducing the new Initialization System in EPiServer CMS 6

2009-12-14T12:43:00.0000000Z

[NOTE - this article has been edited to show changes introduced after RC1. ~~Strikethrough~~ is used to indicate things no longer relevant after RC1]

For some background I suggest that you read the previous blog post in the series http://world.episerver.com/Blogs/Magnus-Strale/Dates/2009/12/Initialization-in-EPiServer-CMS-5/

This post describes what the initialization system is and how it works. How to implement your own initialization module will be the subject of a future blog post.

Why a new initialization system?

There are several ways to initialize modules in EPiServer CMS 5, but none were really designed as a general-purpose initialization system. The new initialization system (resides in assembly EPiServer.Framework, namespace EPiServer.Framework.Initialization) is designed and implemented with the purpose of being the sole initialization mechanism to use for both EPiServer internal code as well as third-party- and custom modules.

What is it?

It is basically four things:

A discovery mechanism to determine which modules should be part of the initialization process.
A dependency sorting algorithm that decides the order of execution.
An execution engine that will execute the modules.
Hooks into ASP.NET to handle re-execution of initialization modules in the face of exceptions during startup.

The discovery mechanism

Locating the initialization modules is done by making use of Microsoft's new Managed Extensibility Framework (aka MEF). MEF will be a part of .NET 4, but for EPiServer CMS 6 we will ship a version running on .NET 3.5.

MEF primarily uses an attribute-based discovery system. We will scan all assemblies loaded in the current AppDomain for initialization modules. This will include all assemblies in the bin folder, since by default the ASP.NET config section <system.web> / <compilation> / <assemblies> contains a line <add assembly="*"/>. This causes the ASP.NET build system to pull in all assemblies in the bin folder.

Interesting note: The ASP.NET build system does not actually guarantee that all the assemblies defined by the <compilation> / <assemblies> are actually loaded into the AppDomain when your web application starts executing. The loading of some assemblies can be delayed in order to speed up the display of the first request reaching ASP.NET. Unfortunately the initialization system is the first piece of code that we want to execute and therefore we have added a small piece of code to force all assemblies in the bin folder to be loaded before the discovery process starts.

In the future we plan to to the same for all assemblies referenced by the <compilation> / <assemblies> section. This would allow us to support plugins / Initialization modules loaded purely from the GAC without having to keep the file in the bin folder as well. Unfortunately this feature did not make it in time for EPiServer CMS 6 code freeze.

In order to have a discoverable initialization module you need to add the [InitializableModule] or [ModuleDependency(...)] attribute to your class and the class should also implement the IInitializableModule interface.

The assembly scanning process supports some filtering logic to reduce the number of assemblies that we scan in order to speed up the initialization process. I will return to the filtering in the third post in this series.

The IInitializableModule interface

This interface is defined like this:


namespace EPiServer.Framework
{
    /// <summary>
    /// Interface that you can implement to be part of the EPiServer Framework initialization chain.
    /// </summary>
    /// <remarks>
    /// You should set the attribute [InitializableModule] on the class implementing this interface, or if
    /// you want to control dependencies (making sure that other modules are initialized before your module gets called)
    /// use the attribute [ModuleDependency(typeof(ClassThatIDependOn), ...)]. 
    /// </remarks>
    public interface IInitializableModule
    {
        /// <summary>
        /// Initializes this instance.
        /// </summary>
        /// <param name="context">The context.</param>
        /// <remarks>
        /// <para>
        /// Gets called as part of the EPiServer Framework initialization sequence. Note that it will be called
        /// only once per AppDomain, unless the method throws an exception. If an exception is thrown, the initialization
        /// method will be called repeatedly for each request reaching the site until the method succeeds.
        /// </para>
        /// <para>
        /// The "called once" guarantee uses the IsIntialized property as defined on this interface. The value of this
        /// property will be set by the EPiServer Framework initialization system and you should not set it directly.
        /// </para>
        /// </remarks>
        void Initialize(InitializationEngine context);
 
        /// <summary>
        /// Resets the module into an uninitialized state.
        /// </summary>
        /// <param name="context">The context.</param>
        /// <remarks>
        /// <para>
        /// This method is usually not called when running under a web application since the web app may be shut down very
        /// abruptly, but your module should still implement it properly since it will make integration and unit testing
        /// much simpler.
        /// </para>
        /// <para>
        /// Any work done by <see cref="Initialize"/> as well as any code executing on <see cref="InitializationEngine.InitComplete"/> should be reversed.
        /// </para>
        /// </remarks>
        void Uninitialize(InitializationEngine context);
 

        /// <summary>
        /// Gets or sets a value indicating whether this instance is initialized.
        /// </summary>
        /// <value>
        ///         <c>true</c> if this instance is initialized; otherwise, <c>false</c>.
        /// </value>
        /// <remarks>
        /// You should usually not set this property from your code. The defautl state should be false and then let
        /// EPiServer Framework initialization set it for you when the initialization is done. The reason for not
        /// setting it from your code is that EPiServer framework will execute in a locked region to avoid concurrency
        /// issues when updating the state of this property. This lock is not available outside the initialization system.
        /// </remarks>
        bool IsInitialized
        {
            get;
            set;
        }
 
        /// <summary>
        /// Preloads the module.
        /// </summary>
        /// <param name="parameters">The parameters.</param>
        /// <remarks>
        /// This method is only available to be compatible with "AlwaysRunning" applications in .NET 4 / IIS 7.
        /// It currently serves no purpose.
        /// </remarks>
        void Preload(string[] parameters);
    }
}

The comments contains most of what is worth mentioning.

Dependency sorting

Those of you that are familiar with EPiServer Community will probably recognize this discussion. The initialization system in EPiServer Framework has used a lot of the ideas in the EPiServer Community initialization system and merged the code so that EPiServer CMS 6 and EPiServer Community 4 will use the same initialization code. This should improve startup time for Relate+ systems (or any web application using both EPiServer CMS and EPiServer Community).

If you have a dependency on one or more existing initialization modules, then you should explicitly state this dependency by using the [ModuleDependency(typeof(ModuleThatIDependOn))] attribute. This is used by the dependency sorting to ensure that all modules are executed in the right order. Note that you can define multiple modules that you depend on.

If your module have no dependencies, which is an unlikely scenario, you can use the [InitializableModule] attribute. Most likely you will at least have a dependency on EPiServer.Web.InitializationModule which sets up all EPiServer CMS internals in order for you to start using the EPiServer CMS API:s.

The modules are then simply sorted so as to guarantee that the modules are executed in dependency order. If you have defined circular dependencies or dependency to a non-existing module you will receive an exception upon startup.

A possible future optimization that I would like to see (don't know it it is going to happen) is to create a dependency graph instead of a sorted list and execute the independent branches in parallel.

The execution engine

...which is actually named InitializationEngine is responsible for executing the list of modules created by the dependency sorting algorithm. As described on the IInitializableModule.Initialize method (see above) we guarantee that the Initialize method gets called once and only once, unless the method throws an exception.

The initialization engine is a simple state machine that will log information about all state changes as well as reporting the initialization methods that has been executed.

In the case of an exception the system will stop executing initialization code and simply wait for the next incoming request and then retry the failing Initialization method. This behavior will make sure that the system does not start up in a partially initialized state.

After all the Initialization methods has been executed successfully an InitComplete event will be raised. This is to allow you to perform post-initialization actions. More details on this in the upcoming "Advanced topics" blog post.

How do we start the initialization system?

The new initialization system is called by the constructor in the EPiServer.Global base class, i e the class that you should inherit from in Global.asax.cs.

The reason we have placed the call to the initialization system at that specific point is simply that it is the first piece of code under our control that is invoked by ASP.NET.

Note that this will change with IIS 7.5/.NET 4 where you can have "Always running" web applications. See Scott Guthrie's blog post http://weblogs.asp.net/scottgu/archive/2009/09/15/auto-start-asp-net-applications-vs-2010-and-net-4-0-series.aspx

In the case of exceptions being thrown by a startup module we will retry the initialization at next BeginRequest event, which is basically the same model as we used for EPiServer CMS 5.

The big advantage of using the earlier initialization point is the fact that it is executed before Application_Start, which will allow us (in most cases - see the next paragraph) to have EPiServer CMS fully initialized and usable from within Application_Start.

Early initialization gotchas

If you remember information from http://world.episerver.com/Blogs/Magnus-Strale/Dates/2009/12/Initialization-in-EPiServer-CMS-5/ you might ask yourself how we can identify the correct EPiServer site section in web.config when there are cases where we do not have access to an incoming request.

Well, we can't.

This is where the TerminateInitializationException comes to the rescue. By throwing this exception from within your Initialization method you will stop execution of the initialization but the InitializationEngine will catch the exception to prevent an error message being displayed.

In the case of the built in SiteMappingConfiguration module this exception is being thrown the first time your EPiServer CMS 6 application executes after installation. The execution will be resumed when we reach the BeginRequest event and the SiteMappingConfiguration will write the "Metabase path-to-siteId" mapping into the EPiServerFramework.config file. This information can then be used on future startups to determine which site section should be used prior to having an active HTTP request.

I think that will be enough for today. Stay tuned for more details and actual "How-to" information for writing your own initialization module.

Initialization in EPiServer CMS 5

2009-12-14T11:04:28.0000000Z

This is the first part of a three part series covering initialization in EPiServer CMS. This is intended to give you an overview of the current (CMS 5) state, while "Introducing the new Initialization System in EPiServer CMS 6" and "The new Initialization System - Advanced Topics" will cover new ground.

Such a simple task as starting your application actually requires a lot of support from your framework. In this case I use the term framework in a very wide sense, including the .NET Framework, IIS and the EPiServer CMS.

Just to list a few requirements:

The startup should be transparent to your application.
You should not have to care about the specifics of the EPiServer product, but treat the startup as if your application runs under plain ASP.NET.
If something goes wrong during the startup the application should fail consistently until the error condition goes away.
You should be able to plug in your own or other third party modules as part of the regular startup handling.
This means that hacking around with HTTP modules and attaching to BeginRequest etc should not be required, nor using the PlugInAttribute system (although that is a reasonably clean approach).
The startup should be able to make sure that different modules are initialized and started in the right order.
Startup should be fast.

Another sore point is the discrepancy between EPiServer CMS and EPiServer Community. These products are now moving closer together by virtue of the EPiServer Framework which is also the place for the new startup and initialization code.

Some background on EPiServer CMS initialization

In EPiServer CMS 5 we moved to a configuration system with where we support multiple site definitions in a single configuration file. Initially we use the IIS instance ID (or actually the Metabase path) to determine which site definition an EPiServer instance should use. We soon moved to use host name information from the incoming request instead, since using the Metabase path caused problems when running in load balanced environments with shared files (you had to make sure that the IIS instance ID was identical on all machines, which was a pain).

One drawback of using the host name of an incoming request is that we have to have an actual request to be able to determine the site section to use. Depending on the IIS version and some other factors this information was not always available as early as we would have wanted. This is what prevents you from calling most of the EPiServer API:s from within the Application_Start method.

EPiServer CMS 5 supports attaching to event handlers from within Application_Start, but not much more. The rest of the code that you want to run on application startup has to be started by the FirstBeginRequest event, which was added to help address the Application_Start problem.

Unfortunately this breaks the transparency requirement above.

How to initialize your custom module

As noted above you can attach to the FirstBeginRequest event, but that requires you to add code to Application_Start which is usually not desirable for a re-usable module since it will require source code changes to the solution where you want to add the module.

Another approach is to write an HttpModule and attach to the FirstBeginRequest event from the HttpModules Init method. This is a cleaner approach but requires changes to the web.config file (adding the HttpModule).

A third approach that has become quite widespread is to make use of the PlugIn system in EPiServer CMS. When you create a custom PlugIn attribute (by inheriting from the EPiServer.PlugIn.PlugInAttribute) and add a static Start method to that class, the Start method will be called by the EPiServer CMS plugin system at startup. The PlugIn system will scan all files in the bin folder and therefore you simply have to drop an assembly into the bin folder for this to work. See one of the many excellent blog articles that has bee written about this approach, for example http://labs.episerver.com/en/Blogs/Allan/Dates/112230/3/When-and-Where-to-attach-DataFactory-Event-Handlers/

Using the plugin system just to get your code to initialize is however somewhat of a hack, although I consider it to be a "clean" hack (if such a thing exists).

There are also several blogs about using the PlugInAttribute approach together with Virtual Path Providers to give you a true single assembly module deployment. See for example http://labs.episerver.com/en/Blogs/Johano/Dates/2008/6/EPiServer-PlugIns-in-one-single-dll/ for more information on this subject.

To summarize: There are several ways to get your custom initialization code to execute, but none of them are truly designed to handle initialization in a robust way. The main issue is usually that a failure condition on startup will only be seen by the person issuing the first web request to the web application.

Soon to follow is "Introducing the new Initialization System in EPiServer CMS 6"

What do we do about config file bloat?

2009-10-22T08:07:31.0000000Z

It is always nice to see our dedicated partners give input / comments / criticism that confirms our internal discussions and decisions. See Fredrik Haglunds blog post about config file bloat here http://blog.fredrikhaglund.se/blog/2009/10/21/i-do-not-like-the-trend-for-episerver-webconfig/ I will now do an EPiServer first (*drum roll*) – I will post the verbatim guideline on dealing with configuration settings. This guideline was approved by the Product Integration Group on September 15.

Do not expect that this will have a huge impact on EPiServer CMS 6 since it is now in the final stages of development, but it will definitely affect our future development efforts.

Large configuration files

Basically every system that has externalized its configuration settings will at some point reach a stage where the amount of configuration is simply overwhelming and start to cause more issues than the configurability was originally intended to solve. I think we are getting close to that point with EPiServer.

There are many reasons that this has happened. Just to name a few: we integrate with the ASP.NET settings and a lot of config overhead is caused by this integration, new features seems to add new configuration needs, all possible config settings are entered in the configuration file regardless if they are at the default setting or not.

Some mitigating factors exists, such as splitting web.config into multiple configuration files to reduce the visual clutter etc, but we are still faced with a real problem.

What should we do?

Limit what you put into the configuration files. Basically everything (yes, there are exceptions) should have sensible defaults and the only things that should appear in config files are exceptions to the default settings. This makes relevant changes easy to spot and will help both customers and our support to spot issues.

General guidelines

Does this feature need configuration at all?
Try to write code that is auto-configuring. This may come into conflict with limiting the amount of external dependencies, but if possible - avoid configuration requirements completely.
Configuration should be optional.
Yes, there are exceptions to this guideline as well. For example if you add functionality that needs configuration (see first guideline) to an existing feature that has an existing config section then you would most likely want to add new config settings with default values to the config file in order to be consistent.
If no config settings exists then sensible and secure defaults should be used. Always secure by default.
Configuration should be overrideable in code (both internal and external).
The API:s that are used to inject the settings from configuration files should be public. This for example requires providers based on the Framework class System.Configuration.ProviderBase to expose public ways of configuring the provider outside of the Initialize(string name, NameValueCollection config) method. Your code should be written to accept configuration changes after Initialize has been called.

These guidelines will introduce new challenges but ultimately I think we and our customers will benefit from it. Just adding configuration capabilities "to increase flexibility" is not always a good thing, increased flexibility = increased complexity. Always having public configuration API:s will also help in unit and integration testing scenarios.

The pain of success

2009-10-16T10:59:00.0000000Z

The heading might come off as slightly arrogant, but it is way more catchy than "The pain of trying to stay binary compatible between releases". EPiServer CMS has been very successful with a large number of installations. With every new release we want as many as possible to upgrade because every new release is always the best ever! If we didn't believe that we wouldn't release it.

Overview

There have been a few occasions where the upgrade path was not very smooth (*cough*), for instance between EPiServer CMS 4 and CMS 5. However that step was an exception - very early in the EPiServer CMS 5 development plan we decided that anything we plan to break compatibility-wise (mainly to take advantage of news in .NET 2.0) should be done in the first release of EPiServer CMS 5. However there have been other releases where changes have affected compatibility in undesirable ways. Since CMS 5 R2 we have a much stricter automated API checker in place that gives warnings as soon as we break compatibility at the binary level.

Maintaining binary compatibility simply means that upgrading EPiServer CMS will not require recompilation of your existing website. This is what our partners and customer want, so we work very hard to achieve this goal. With a lot of existing sites, even the smallest glitch can affect a lot of customers (the pain of success...).

The details

I have just finished work on a change that involved some fancy footwork to keep compatibility and improve both functionality and performance at the same time. Some background - the Friendly URL (FURL for short) feature in EPiServer CMS gives you readable and nice looking URLs that mirror the structure of your site. It is also a feature that consumes quite a lot of CPU for each request (somewhere between 5 - 25%).

We have worked on improving the performance and immediately ran into compatibility problems with rewriting the FriendlyUrlRewriteProvider (FURP) class. The easy solution is to create a new provider and leave the old in place until we can remove it completely, a very nice benefit of the provider model. This is what we did, thus the HierachicalUrlRewriteProvider was born, which basically does the same job as the FriendlyUrlRewriteProvider.

But when you use FURL you might have parts of your URL namespace that you do not want to be rewritten, maybe some special file. The solution has been to add these paths to the UnTouchedPaths property on FriendlyUrlRewriteProvider (or you can attach event handlers, but that is another story).

Here is a typical compatibility problem - the property lives on the FURP class and we know that there is code "in the wild" that uses this property. Since HierachicalUrlRewriteProvider should have this feature as well, why not simply delegate to FURP? ...or even inherit from FURP? Remember that we said that we eventually want to remove FURP - inheriting from it would mean breaking binary compatibility when FURP dies.

As it is a feature that we want both FURP and our new provider to share, it makes sense to move it to a common base class. The UrlRewriteProvider class is that base class. However simply moving UnTouchedPaths to UrlRewriteProvider is not enough. The property is simply a list of strings that was linearly scanned for matching paths and this immediately translates to poor performance as soon as the number of entries starts to increase. It is also limited to matching complete paths and we wanted to expand the feature into matching entire directory structures as well.

We ended up by implementing three methods and one property on UrlRewriteProvider:

void AddExcludedPath(string path) - Does what the API says. The twist is that paths ending with a slash will be treated as directories and anything that starts with that string will be excluded. Other paths will be complete matches.
bool IsExcludedPath(string path) - Returns true if the path matches anything added with AddExcludedPath
void ClearExcludedPaths() - Once again quite self-explanatory.
IEnumerable<string> ExcludedPaths - Expose what has been added with AddExcludedPath

That takes care of the needs for our new feature, but how do we match this with the existing UnTouchedPaths property on FURP? We fake it - or more correctly we created a new private class AddOnlyList that implements IList<string> (which is the type exposed by the UnTouchedPaths property). This class is not a complete list implementation, it only supports Add, Clear, Contains, Count and GetEnumerator. These implementations delegate to the methods/property on UrlRewriteProvider as described above. Other methods/properties will throw a NotImplemented exception.

We now have binary compatibility (UnTouchedPaths and FURP have the same public API as before) and improved functionality (ExcludedPath with directory support). What we don't have is complete semantic compatibility - there are holes in the List implementation returned by UnTouchedPaths and the actual behavior of the excluded paths is a bit different with directory matching. For example if we do
FriendlyUrlRewriteProvider.UnTouchedPaths.Add("/abc/");

then calling

FriendlyUrlRewriteProvider.UnTouchedPaths.Contains("/abc/somefile.htm");

would return true.

The solution is not perfect, but solves the vast majority of problems related to the UnTouchedPaths property.

What about the HierarchicalUrlRewriteProvider you may ask? Well, that will be the subject of another blog post.

Magnus Stråle