Implementing Dublin Core meta data for EPiServer pages via a plug-in
Dublin Core is in heavy use in the UK public sector – anybody who has had to complete an RFP document for a UK public sector customer will be familiar with questions regarding your proposed CMS platform’s support for it.
Dublin Core is a meta data standard that is used to describe resources across different platforms and technologies in a way that makes them easy to find. It’s an XML-based resource description format that is used to describe a wide range of resources, including video, images, text and – perhaps most commonly – web pages.
The Simple Dublin Core Metadata Set (DCMES) is composed of a list of fifteen separate elements which include simple descriptive items such as title, creator, subject, description and creation date. These basic elements have been extended to include a number of other tags and qualified with a set of recommended encoding schemes, such as ISO8601 for date formats. Dublin Core is implemented on web pages as a set of meta tags that conform to the published Dublin Core schema, although the mixture of tags used tends to vary considerably between different sites.
A typical set of Dublin Core tags look something like the example below:
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
<meta name="DC.title" lang="en" content="Home page" />
<meta name="DC.creator" content="Ben Morris" />
<meta name="DC.publisher" content="Acme Corporation" />
<meta name="DC.date" scheme="ISO8601" content="2010-04-23T16:27" />
<meta name="DC.type" scheme="DCMIType" content="Text" />
<meta name="DC.format" scheme="IMT" content="text/html" />
<meta name="DC.identifier" content="/dublin-core-meta-data-plug-in-for-episerver-pages/" />
<meta name="DC.language" scheme="RFC1766" content="en" />
<meta name="DC.coverage" content="World" />
<meta name="DC.rights" content="/terms-and-conditions/" />
The content for all of these tags can be easily generated from EPiServer – this post explains a technique for developing an EPiServer plug-in that will automatically insert Dublin Core meta tags to EPiServer pages that can be dropped in to an existing site without requiring any code changes to your site or templates.
This plug-in works via an http module, which intercepts requests for EPiServer pages immediately after the content has been processed, formats a set of meta tags based on the page’s EPiServer properties and inserts the tags just before the end of the page’s </HEAD> section. A code sample of the whole thing can be downloaded from here (ZIP archive, 7KB), but the following is an explanation of the mechanics.
Creating the http module
An http module allows you to hook into a number of events that are raised during the http request cycle. The event that we are interested in here is the PostRequestHandlerExecute event, which occurs just after the page has finished execution and the HTML content is ready to be sent back to the browser.
Creating an http module is pretty straightforward – you create a class that implements the IHttpModule interface and register it in web.config. This interface requires two methods: Init(), where you hook up any events that you want to execute code on, and a Dispose() method to clean up any resources that you have used.
The code sample below shows a basic module that hooks into the PostRequestHandlerExecute event.
using System;
using System.Web;
using EPiServer;
using EPiServer.Core;
namespace EPiPlugins.DublinCore
{
public class DCHttpModule : IHttpModule
{
public void Init(HttpApplication context)
{
//* Subscribe to PostRequestHandlerExecute
context.PostRequestHandlerExecute += new EventHandler(context_PostRequestHandlerExecute);
}
public void context_PostRequestHandlerExecute(object sender, EventArgs e)
{
//* Do some work...
}
}
}
The event handler for the PostRequestHandlerExecute event will have to do the following work:
- Get the current application context – this will give you access to the Request and Response objects.
- Check that the current URL refers to an EPiServer page and ignore everything else (note that every request handled by ASP.NET will pass through your http module – this includes javascript files, images and style sheets).
- Get the PageData object for the current page.
- Format some Dublin Core meta data on the basis of the PageData object.
- Filter the out-going response in order to insert the meta data.
- The basic outline of this event handler will look something like this:
public void context_PostRequestHandlerExecute(object sender, EventArgs e)
{
if (sender != null)
{
//* Get the application reference - this allows us into the HttpContext
HttpApplication senderApp = (HttpApplication)sender;
//* Only process aspx pages
if (senderApp.Context.Request.Url.AbsoluteUri.IndexOf(".aspx", StringComparison.InvariantCultureIgnoreCase) >= 0))
{
//* Try to get a page reference from the URL
PageReference currentPageRef = PageReference.ParseUrl(senderApp.Context.Request.Url.AbsoluteUri);
//* If there's no PageReference then this is not a request for an EPiServer page, so ignore it
if (currentPageRef != PageReference.EmptyReference)
{
//* Fetch the page data
PageData thisPage = DataFactory.Instance.GetPage(currentPageRef);
if (thisPage != null)
{
//* We have page data - format some Dublin Core tags based on the page data
string tags = FormatTags(thisPage);
//* Apply the DCStream filter to the response and it will insert the Dublin Core meta tags
senderApp.Context.Response.Filter = new DCStream(senderApp.Context.Response.Filter, tags);
}
}
}
}
}
Creating a stream wrapper class to filter the content
Note that at the end of the PostRequestHandlerExecute event we set the Filter property of the current Response object to a new instance of a class called DCStream. This is where we wrap the http response stream into a custom object that allows us to modify the entire response before it goes to the client.
The DCStream class is a wrapper around the Stream object that manages an internal representation of the response stream, inserting the Dublin Core tags via an over-write of the Write() method as shown below:
public override void Write(byte[] buffer, int offset, int count)
{
string strBuffer = System.Text.UTF8Encoding.UTF8.GetString(buffer, offset, count);
//* Look for the closing HEAD tag
int insertionPoint = strBuffer.IndexOf("</head>", StringComparison.InvariantCultureIgnoreCase);
if (insertionPoint >= 0)
{
//* Add the Dublin Core tags in to the content
strBuffer = strBuffer.Insert(insertionPoint, _DCTags);
}
//* Look for the closing HTML tag
if (strBuffer.IndexOf("</html>", StringComparison.InvariantCultureIgnoreCase) < 0)
{
_ResponseHtml.Append(strBuffer);
}
else
{
//* Transform the response and write it back out
_ResponseHtml.Append(strBuffer);
string finalHtml = _ResponseHtml.ToString();
byte[] data = System.Text.UTF8Encoding.UTF8.GetBytes(finalHtml);
_ResponseStream.Write(data, 0, data.Length);
}
}
Adding your http module to web.config
To get your website to pick up the http module you need to add a reference to it to your web project and then add a declaration into the system.webServer/modules section of your web.config file. This declaration will look like this:
<add name=”DCHttpModule” type=”EPiPlugins.DublinCore.DCHttpModule, EPiPlugins.DublinCore” />
Configuration
Not every field in the Dublin Core set has a direct equivalent in the EPiServer PageData object, so you will need to map some of the fileds onto EPiServer page properties. You may want to map the Dublin Core “description” element onto a specific EPiServer property that will be present on all EPiServer pages, or assert a particular copyright message for the “rights” element for every page.
There are a number of ways to implement this kind of configuration – I have used a custom configuration section in my code example that lets you define your own section in the web.config file.
Code sample
You can download a full working sample of the code from here (ZIP archive, 7KB).
Bear in mind that this code is a proof of concept rather than a match-fit piece of production code.
Note that the references for the sample project are set to EPiServer 6 DLLs (v 6.0.530), but it works on EPiServer 5 too.
Why not just use a plain old webcontrol in the master page? Too simple?
/ Patrik Akselsson
I agree with patrik, .aspx pages _could_ also be used for generating images, styles and javascript. Which would break using this (without modifiying it).
/ Björn Olsson
The rationale for the approach is to create an implementation that can be easily shipped as a separate DLL and dropped into a site without having to make any changes to templates. This kind of modular re-usability becomes pretty important if you're having to develop multiple EPi sites.
Also, although ASPX pages can indeed be used to serve up pretty much anything, the code does carefully check that an EPi page is being requested before doing any work.