Mirroring - Configuration and Operation

Product version:

EPiServer CMS 5 SP1

Document version:

1.1

Document creation date:

07-06-2006

Document last saved:

21-01-2008

Introduction

This document contains a description of content mirroring and explains how to configure mirroring. The document also contains a test scenario and a detailed troubleshooting chapter.

Table of Contents

Prerequisites

The following is required to be able to configure mirroring correctly.

  • Both sites must run EPiServer CMS 5 R1 or later.
  • Both sites must have page types with the same name, in this case. Both page types should share at least ”MainBody” so that there is a property to display externally.

It is also a good idea that you have the following installed.

  • Latest service packs for your operating system. The issue detailed in the following KB article, http://support.microsoft.com/kb/886461/en-us can affect mirroring in EPiServer CMS 5.
  • Latest service packs for .NET Framework 2.0 and 3.0.

Basic Concepts

Channel

In EPiServer CMS 5, the information defining the content that should be mirrored from the Web site is defined in channels. You can have multiple channels defined in one EPiServer CMS 5 site. Each channel contains properties that define the pages that should be included in a channel. A channel does not know or care about where the pages are being delivered, it only makes sure the underlying publisher receives information about the changes according to the channel properties.

Note  It is important to understand that the channel mirrors what it can see, not the actual pages. This means that access rights, filters, publish dates, etc. can be used to obtain a customized view of data.

Channel Type

One of the major properties of a channel is the channel type. There are three types of channel to choose from:

  • Tree - The tree model finds all changes made to a tree (move, delete, update, create). All these changes will be sent to all destinations. The tree model always finds differences by starting at the root page and recursing the tree. This means that a sub-page will never be included if the parent is not included. If you exclude a page (by using a filter), all sub-pages will also be excluded. This type is primarily used to be able to mirror a tree structure where the target will be a replica tree structure of the source.
  • List - The list model works exactly the same way as a tree model with the difference that it won’t recurse after the first level.
  • Search - The search model will only find changed (or "marked as changed") or new pages. Delete and move operations are not intercepted. This model is specifically designed to be able to mirror pages regardless of location to a single destination/list. This is useful for scenarios when news should be exported to another system (news service or another global news list in another EPiServer CMS 5 system).

Scheduled Mirroring

New or modified content can be released either manually or automatically. This is defined in the channel properties. Content is manually released by the editor from the Action window.

The Mirroring service in Admin mode can be used to set up a scheduled job, so that content is automatically mirrored at, for example, a set time every day or week.

When changes are approved they are handed over to destinations as defined on the channel, these changes are queued in the database on each and every destination.

Technical Description

On the sender side, a "state" is kept that records which pages have been sent to a given destination. Whenever the mirroring job is run, the current site structure is extracted and then compared with the recorded "state" of the receiver site. This comparison leads to a number of operations being queued, such as a "publish", "move" and "delete". After the comparison and operations queuing, the recorded "state" of the receiver site is updated to reflect the updated situation.

The receiver side keeps a "mapping table", which records the mapping between the sender page IDs and the receiver site page IDs. So, when the receiver side receives a request from the sender to "publish", "move" or "delete" a page, it uses that sender page ID to look up the corresponding local page ID - mapping it in other words. This "mapping table" is built on-the-fly as mirroring requests are received by the Web service.

A "reset state" option is available on the sender side to clear the recorded "state", but this does not affect the receiver site. When resetting the state on the sender site, it is also a good idea to reset the receiver site at the same time. The best way to do this is currently to delete the pages in the receiver side and empty the Recycle Bin. This will ensure that the sender and receiver are in agreement. The "mapping table" will still be there, but all its targets have been deleted and it will thus clean itself up in the next mirroring.

Mirroring also picks up referenced files, and will begin a mirroring operation by sending these to the receiver using a Web service. Once that is completed, the actual page updates are sent. This will ensure that when pages are published, the relevant files will be there. It will also keep the size of each transfer down, since each file is sent separately.

Export/Mirroring at Initial Deployment

The first time content is mirrored from the sender to the receiver, the mirroring should run without error as the receiver side has no current "states". When mirroring initially starts, the sender side sees that no content has been mirrored and will send all content regardless.

At the receiver end, there is no mapping table either, so the receiver will pick up the pages and effectively build an identical parallel tree. This might not be what you want, but it should not be a problem, as you can just switch the start page to the newly mirrored tree and remove the imported one at your leisure. The initial mirroring requires a full overwrite on the target.

Mirroring Locking

Mirroring is not run as a single "transaction" and sender databases are not locked for read/write when a site is mirrored. There is a slight risk of short-term inconsistencies that will be resolved at the next mirroring operation, if editing is being done at the time of the mirroring, but there is no risk of long-term inconsistencies.

Example:  Consider a "delete". Imagine that you start comparing the sender site with the recorded "state" of the receiver and find a page as unchanged there. After that an editor deletes the page. That change will not be detected by that mirroring operation. There are other more complex scenarios, but generally any inconsistencies will be fixed by the next run.

Typical Mirroring Scenarios

This chapter describes some typical scenarios for when mirroring may be used.

Mirroring to Another EPiServer CMS 5

Mirroring to another EPiServer CMS 5 can be summarized as an automated export/import between two Web sites. When the content is changed, it is packaged into an export package in EPiServer CMS 5 and sent via a Web service to the other Web site, where it is unpacked and imported into the system.

There are, however, several differences between standard export/import in Admin mode.

  1. The receiver remembers which pages have been received for a certain channel and will make sure that the next time the same page is received, it will be updated instead of being re-created.
  2. Files and images will not be packaged and sent inside the export package. They are sent separately before the actual export package is sent.

Microsoft Web Service Extensions (WSE), which supports DIME, makes it more efficient to send binary files.

Mirroring to another EPiServer CMS 5 could perhaps be used if you want to have one development / test environment that is mirrored to an external environment.

Mirroring to HTML

EPiServer CMS 5 can mirror pages as HTML files by sending a Web request to the page's URL. This request downloads the content and stores it as HTML files in the local file system on the server. A tree structure in EPiServer CMS 5 will in this way be mirrored to a tree structure in the file system, where a page becomes a folder. The function will also search all HTML that is downloaded and search for references to images, links and style sheets.

Note:  Links must be relative to the site for this to work; otherwise the links will be left untouched.

Note: Be aware that only resources referred to in EPiServer CMS 5 pages can be mirrored. For example, when mirroring a page that refers to a stylesheet, and this stylesheet in turn refers to an image, the stylesheet will be mirrored since the page referred to it, the image will not.

Mirroring to HTML takes longer than the other types of mirroring, as the Web server must be contacted for each page and the content downloaded. As changes to a single page can affect a large amount of pages, e.g. menus, there is a setting that controls that each update fetches all pages all over again and checks whether they are affected by the change. 

Mirroring to XML

XML mirroring works in a similar way to HTML mirroring in that each page will be mirrored as a file on the local file system. The only difference is that instead of downloading HTML from the page, the properties will be extracted and formatted to an XML document via an XML style sheet (XSLT).

FAQ

The mirroring process takes a very long time to run. RAM is not a problem, but the CPU was very busy. Is this normal?

Inserting a page is a very heavy operation in EPiServer CMS 5, so it is probably quite natural that the import takes as long as it does. This should not be an issue after the initial mirroring, as it is the insertion of pages that is heavy, and a typical mirroring operation is not at all comparable to importing/initially mirroring the whole site.

However, if you are still experiencing problems after the initial mirroring, it may be due to the fact that large objects (over 64k) are not always handled by the garbage collector of ASP.NET. Refer to the FAQ Performance issues and OutOfMemoryException.

When a site is being mirrored, is a read/write lock placed on the sender database?

Mirroring is not run as a single "transaction", so the sender database is not locked during mirroring. There is a slight risk of short-term inconsistencies that will be resolved at the next mirroring operation, if editing is being done at the time of the mirroring, but there is no risk of long-term inconsistencies.

What happens when a Web site visitor requests a page that is currently being updated on the receiver side?

If a visitor requests a page just as it is being updated by the mirroring receiver, the page the visitor sees will depend on the visitor's timing, i.e. either the new or the old page.

We want to break down the mirroring into the main sections of the Web site, but are worried that links between sections will break. Is there a setting to ensure that the links do not break?

It is possible to break mirroring of EPiServer sites into smaller sections. The way to do this is by configuring several channels and selecting the "Allow receiver to fetch links from other channels" check box in the EPiServer Destination window.

What happens when I export a file that is part of a page folder?

When exporting a file that is part of a page folder (or subfolder) all files in that folder (or subfolder) will be exported.

Troubleshooting

Error: Server found request content type to be 'application/dime', but expected 'text/xml'

Make sure that you added the Web service extension in web.config on the destination server as described in the instructions above.

Error: Found a high surrogate char without a following low surrogate. The input may not be in this encoding, or may not contain valid Unicode (UTF-16) characters

Make sure that you added the Web service extension in web.config on the destination server as described in the instructions above.

Error: Object moved to here / Access denied

Make sure that the Web Service user has access to log on to the server. For more troubleshooting and configuration options, please refer to the “Web Services” technical note. 

Error: System.Web.HttpException: Maximum request length exceeded

Maximum request length exceeded when exporting large amounts of information.

This problem is solved by changing certain settings in the system. Make sure that both httpRuntime and maxRequestLength are set in the receiving mirroring site. Change the maxRequestLength to 40960 KB (40 MB).

Example:

<httpRuntime maxRequestLength=”40960” />

<configuration>

       <microsoft.web.services2>

       <messaging>

       <maxRequestLength>40960</maxRequestLength> //kilobyte

       </messaging>

       </microsoft.web.services2>


Error: Timeout exceeded

To solve the problem of an exceeded timeout, increase the timeout value in Internet Information Service (IIS). It may also be necessary to increase the timeout value for the receiving site. See http://msdn.microsoft.com/library/en-us/wse/html/940ecc18-25ce-45d8-b040-408d931d9fe1.asp?frame=true for further information.

Configuring and Testing Mirroring

The following instructions show how to setup a destination and source site so that you can publish content from your source site to the destination site.

Step 1 - Set Up the Destination Site

  1. Install a new site. This is the destination site.
  2. This step only applies if your destination site is using Forms authentication.
    Set up the new site with basic authentication. Make sure that the IIS directory security is set to Basic authentication AND that anonymous access is NOT allowed.  
    1. Open Internet Information Services Manager on the Web server and select the /WebServices folder on your remote EPiServer CMS 5 Web site.
    2. Right-click and select Properties. Under the Directory Security tab, click Edit.
    3. The authentication options must be configured for Basic Authentication only. Otherwise automatic authentication will not occur.
    4. Edit the Web configuration file, web.config, in the root directory. Make sure that the BasicAuthentication filter defined under the httpModules section is not commented out. The BasicAuthentication http module will translate basic authentication requests on-the-fly to forms-authenticated cookies. 

      <httpModules>

      <add name="BasicAuthentication" type="EPiServer.Security.BasicAuthentication, EPiServer" />

    5. Assign a user or group permission to access the Mirroring Web Service. To do this, logon to EPiServer Admin mode. Click the Permissions for Functions link on the Config tab. Click the Edit button for the Allow the user to act as a web service user option. Add the user or group that you wish to grant access to the web service. Note that users and groups available here are relative to the destination site and not the source site, the authentication is performed at the destination site. The user or group chosen here must have permission to write files to the VPP directory. For more information see the Virtual Path Providers in EPiServer CMS 5 technote.
    6. Test the setup by opening a Web browser and entering the URL to a Web service on destination site, for example: http://localhost/RemoteSite/WebServices/PageMirroringService.asmx. You will receive a standard Windows login pop-up window.
    7. Enter the Web service user account information. If everything is working, you should see the Web Service definition page.
    8. Update the web.config file on the destination site to enable SOAP extensions. Add the following under the <system.web>  section:

      <webServices>

        <soapExtensionTypes>

            <add type="Microsoft.Web.Services2.WebServicesExtension, Microsoft.Web.Services2, Culture=neutral, PublicKeyToken=31bf3856ad364e35" priority="1" group="0" />

        </soapExtensionTypes>

      </webServices> 

Step 2 - Set Up the Source Site

  1. Log on to Admin mode. Click the Remote Web Sites link on the Config tab.
  2. Click Create. Enter a name in the Name field and the URL of the destination site e.g. http://localhost/RemoteSite” in the URL field.
  3. Enter the details of the user, password, and domain that will be used to access the web service. This should be the same user granted access to the web service on the destination site. Alternatively, if a group was granted access instead of a user then the user entered here should be a member of that group. 
  4. Click Save. Click Ping and verify that the connection between the source and destination site works.

Test Scenario

Make sure that you have followed the instructions in the previous chapter regarding configuration of content mirroring.

Create a Channel and Destination in Your Source Site

  1. In the source site, go to Admin mode, click the Config tab and then Mirroring Administration.
  2. Click Create. Enter a name in the Name field. Choose a page, from where you wish to publish the tree structure (It should have children).
  3. Choose "Tree" in the Mirror Type box.
  4. Select Include the start page. Click Save.
  5. Click Create Destination. Select “EPiServer” in the Select destination type box. Click OK.
  6. Enter EPiServerDest in the name field. Choose your remote site in the Remote site box. Choose a page at the remote site and enter the page’s ID in the Root page on destination field. Click Save.

Publish a Page to the Remote EPiServer CMS 5 Site

  1. In the source site, edit the page that you selected as the publishing start page above. Save and publish the page.
  2. Open the Action Window and click Approve mirroring updates. A list of updated channels appears and “SourceChannel” is listed with the amount of updated pages in parentheses. Click SourceChannel.
  3. The currently updated pages are listed and a Publish button appears at the bottom. Click Publish.
  4. Go to Mirroring administration on the Config tab in Admin mode. A list of queued jobs is listed under the channel Queue Length (If the scheduled service already executed it will say 0).
  5. If the scheduled service does not run, click Mirroring Service under the Admin tab and click Start Manually. Check that the pages were published on the remote site.

Publish a Page to HTML

  1. In the source site, go to Admin mode, click the Config tab and then Mirroring Administration.
  2. Click SourceChannel and then Create Destination.
  3. Select “HTML” as the destination type and click OK.
  4. Enter EPiServerHTML in the name field.
  5. Create a directory “C:\episerverhtml” in your file system. Enter C:\episerverhtml in the Target Directory on the server” field. Enter /episerverhtml/ in the Relative root path field if you are to run the remote site from your hard drive. Click Save.
  6. Create a new page in the source site, publish it, and approve the mirroring updates in the Action Window. If the scheduled service does not run, click Mirroring Service under the Admin tab and click Start Manually.
  7. Verify that the pages have been written to “C:\episerverhtml”.

Publish a page to XML

  1. In the source site, go to Admin mode, click the Config tab and then Mirroring Administration.
  2. Click SourceChannel and then Create Destination.
  3. Select “XML” as the destination type and click OK.
  4. Enter EPiServerXML in the name field.
  5. Create a directory “C:\episerverxml” in your file system. Enter C:\episerverxml in the Target Directory on the server field.
  6. Create a file called template.xsl under C:/episerverxml and fill it with the text in the "template.xsl" chapter of the Appendix. (The demo template.xsl is only a basic XSL example.)
  7. Enter C:\episerverxml\template.xsl in the Path to XSL template field. Click Save.
  8. Create a new page in the source site, publish it, and approve the mirroring updates in the Action Window. If the scheduled service does not run, click Mirroring Service under the Admin tab and click Start Manually.
  9. Verify that the pages have been written to “C:\episerverxml”.

Appendix

template.xsl

<!--

      - XSLT is a template based language to transform Xml documents

      It uses XPath to select specific nodes

      for processing.

     

      - A XSLT file is a well formed Xml document

-->

<!-- every StyleSheet starts with this tag -->

<xsl:stylesheet

      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

      version="1.0">

<!-- indicates what our output type is going to be -->

<xsl:output method="xml" />        

     

      <!--

            Main template to kick off processing our Sample.xml

            From here on we use a simple XPath selection query to

            get to our data.

      -->

      <xsl:template match="/">

            <Page>

                  <Title><xsl:value-of select="/page/properties/property[@name='PageName']/text()"/></Title>

                  <Body><xsl:value-of select="/page/properties/property[@name='MainBody']/text()"/></Body>

            </Page>                

      </xsl:template>

</xsl:stylesheet>

Mirroring Field Descriptions in Admin Mode

Mirroring Administration Window

Field

Description

Channel Name

Lists all the channels that are available for the site. Click a channel name to display the Destination Overview window.

Create (button)

Click Create to create a new channel. This opens the Mirroring Settings window.

Mirroring Settings Window

This window is displayed when you create or edit a channel.

Field

Description

Information tab

Name

Name of the channel.

Start page

Select a page from where you want to publish the structure. The page must contain pages or sub-folders.

Mirror type

Select the type of mirroring to be done. See the "Basic Concepts" chapter for further information on the different mirroring types.

Globalization support

Applies to globalized Web sites. Select whether you want to mirror the only the original language, all languages or another language.

Approve changes automatically

Select this check box if you want any changes to be approved automatically instead of approving them manually in the Action Window in Edit mode. Any changed pages will therefore be updated on the receiving site the next time the mirroring service is run.

Include the start page

Select this check box if you also want the changes to apply to the start page.

Run as anonymous user

Select this check box if you want to run the mirroring job as an anonymous user. If not, enter a username, password and domain.

Property Filter tab

Activate filter

Select this check box if you want to activate the filter settings.

Filter by property name

Enter a property name, i.e. WriterName, to only mirror pages that include that property.

Filter by property value

Enter a property value for the stated property name. For example, if you enter property name WriterName and property value Charlie, only pages that include the value Charlie in the Writer field will be included in the mirrored site.

Destination Overview Window

This window displays an overview of the defined destinations for this channel

Field

Description

Destinations

Lists the destinations that have been created for the channel.

Last status

Displays the status of the most recent mirroring execution.

Last execution

Displays the date of the most recent mirroring execution.

Queue length

Displays how many jobs will be run the next time mirroring is executed.

Edit Queue (button)

Opens the Mirroring Queue for Destination window from where you can delete pages and packages that should not be included when mirroring the site.

Mappings (button)

Opens the Mirroring Mappings window from where you can delete individual mappings or all mappings.

Settings (button)

Opens the Mirroring Settings window.

Reset State (button)

Click Reset State on the sender side to clear the recorded "state". This will not affect the receiving site.
You may want to reset the receiver site as well when you do this. This is currently the best way is to delete the pages, and empty the Recycle Bin. This will ensure that the sender and receiver are in agreement.

Destination Window

This window varies depending on the destination of the mirrored site:

  • Mirror to EPiServer
  • Mirror to HTML
  • Mirror to XML

Mirror to EPiServer

Field

Description

Select destination type

EPiServer

Information tab

Name

Enter a name for the destination.

Remote site

Select a receiving site from the drop-down list.

Root page on destination

Enter the page ID of the root page on the receiving site.

Publish pages

Select this check box if you want to publish the pages automatically on the receiving site. If you leave this check box empty, the mirrored pages must be published manually on the receiving site.

Allow receiver to fetch links from other channels

Select this check box if you want to be able to mirror content between different channels.

Queue tab

This tab is only available if the changed pages to be mirrored have been approved. This tab is for information only.

Queue number

Lists the numbers of the queues to be included in the next mirroring execution.

Item created

Date when the pages to be mirrored were changed.

Mirror to HTML

Field

Description

Select destination type

HTML

Information tab

Name

Enter a name for the destination.

Target directory on the server

Create a directory in your file system where you want the HTML pages to be published. Enter the name of the directory in this field, e.g. C:\episerverhtml.

Relative root path

Enter a prefix to be used for all the links. For example, enter "/episerverhtml/" in this field if you have selected C:\episerverhtml as the root directory and you are to run the remote site from your hard drive.

Default name for files

Change this value if you want your HTML files to have an alternative name as default.

Use the following property for folder names

This field defines the property name that control the name of the folder after mirroring to HTML.

Verify check sum for all pages every time

If this check box is selected, all the pages are downloaded every time a change is discovered on a page. One page can change many pages in listings, site maps, etc. Mirroring is speeded up if you do not need this function.

Do not include file name in links

Select this check box if you will be running the HTML site from a CD or hard drive and want to include the file name, e.g. Default.htm, in links. If you will be running the site online, it is usually preferable to only use the folder name for links and configure Default.htm as the default document in the Internet Information Services (IIS).

Apply channel filter

It is possible to require that certain properties apply to certain channels. Select this check box if you want to activate that listings and the menu tree are filtered in the same way when HTML is downloaded from a page.

Queue tab

This tab is only available if the changed pages to be mirrored have been approved. This tab is for information only.

Queue number

Lists the numbers of the queues to be included in the next mirroring execution.

Item created

Date when the pages to be mirrored were changed.

Mirror to XML

Field

Description

Select destination type

XML

Information tab

Name

Enter a name for the destination.

Target directory on the server

Create a directory in your file system where you want the XML pages to be published. Enter the name of the directory in this field, e.g. C:\episerverxml.

Path to XSL template

Create a file called template.xsl under your target directory. Fill the file with relevant text. (An example template.xsl can be found in this document.) Enter the path to your XSL file in the Path to XSL template field, e.g. C:\episerverxml\template.xsl.

Relative root path

Enter a prefix to be used for all the links. For example, enter "/episerverxml/" in this field if you have selected C:\episerverhtml as the root directory and you are to run the remote site from your hard drive.

Default name for files

Change this value if you want your HTML files to have an alternative name as default.

Use the following property for folder names

This field defines the property name that control the name of the folder after mirroring.

Do not include file name in links

Select this check box if you will be running the HTML site from a CD or hard drive and want to include the file name, e.g. Default.htm, in links. If you will be running the site online, it is usually preferable to only use the folder name for links and configure Default.htm as the default document in the Internet Information Services (IIS).

Queue tab

This tab is only available if the changed pages to be mirrored have been approved. This tab is for information only.

Queue number

Lists the numbers of the queues to be included in the next mirroring execution.

Item created

Date when the pages to be mirrored were changed.