Event System Bug in CMS 6

Cute Bug There is a (minor) bug in the EPiServer Event Management System when using tcp – this blog post describes how the bug behaves and what we did to fix the problem.

For those new to EPiServer, the event system provides a mechanism for distributing events within an EPiServer CMS site, between EPiServer CMS sites on the same physical server (enterprise), and between EPiServer CMS sites on separate servers (load-balanced standard and enterprise). Read more about it here.

If my case the customer had 3 servers, one backend and two frontend servers, all running EPiServer CMS 6 (6.0.530.1), and we were using TCP instead of UDP.

A while after first setup of our test environment, we started to find warn messages in the log files:
Exception calling IEventReplication::RaiseEvent on remote object. System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host.

How the bug manifests itself

Backend server starts
Frontend server starts
Content change on backend, event sent to frontend. Cache invalidation OK
Frontend recycles
Content change on backend, event sent to frontend.
Error, connection forcibly closed. Cache not invalidated. Warning message in log file.
Content change on backend, event sent to frontend. Cache invalidation OK.

As you can see, the cache update in step 5 is lost. Next update (step 6) will work as the connection is recreated and the whole cache is cleared. In other words, this will only affect first content change after a restart of front server(s).

How we got around it
We solved the problem by using wsHttpBinding instead of netTcpBinding. This is a bit problematic, as it runs a security check before opening a port (while tcp don’t). This means that you need to run the following command on the frontend servers (30001 is port used for cache invalidation, + equals “localhost” and IIS_IUSRS is application pool user):

netsh http add urlacl url=http://+:30001/ user=IIS_IUSRS

The problem has been registered as a bug by EPiServer support:
#64186: Cache update fail at first time publish in load balance environment using tcp.

May 04, 2011

Comments

smithsson68@gmail.com May 4, 2011 03:33 PM

Not sure how this can be a bug in the code as it's the same code regardless of which WCF protocol is used.

By the way, the events system is also used for Community from V4.

May 6, 2011 08:27 AM

Good question Paul, and I guess it depends.

However, the event system could be made more robust by checking the state of the connection in case of an exception, and retry the send if the connection has been closed (from what I've been able to google, this should be possible.)

What we can conclude is that the cache invalidation does not work all that well with tcp - out of the box. Udp is not a suitable option for most secure environments, and network admins tend to get this funny look when I tell them they need to allow UDP broadcast through the firewall (it is not that common to do this).

The event system is great! I've used it for custom cache invalidation, and it is easy to do. All our products, and 3rd party products should piggy back on it to get the messages across.

Btw. it seems EPiServer Commerce is also using it now. Good thing!

May 6, 2011 09:48 AM

Mari,

Though I do not know what the event system is used for in this case, but assuming it's out of the box code - the core issue is that the event being sent never reaches the frontend?
Because the recycle of the frontend server would probably purge everything from the cache anyways so the content itself would be fine.

Paul: having troubleshooted (in progress) the event system, using three different bindings i'd say that things aren't as easy as just switching the bindings/protocols and things still work.

For example there are certain bindings that do not seem to work well with the system probably due to the usage of inheritance in the classes of the objects being transferred over the wire.

Jan 30, 2012 10:05 PM

Shamrez, the change on the back-end would typically occur after the recycle on the front-end, and the cache invalidation event on the front end is not picked up, so the content is out of date until the next publish or recycle (whatever comes first).

There is a new version of CMS6 R2 on the Download page, containing quite a few bug fixes, including this one. For the full list, see http://world.episerver.com/PageFiles/110367/BugList.txt

Please login to comment.