Try our conversational search powered by Generative AI!

Vladimir Vedeneev
Mar 13, 2011
  6037
(8 votes)

Cache and GoogleBot

The Case

In our production we’ve faced with interesting issue. Sometimes, on some pages users experienced something-not-working errors. For example, AJAX search button worked like a simple label – you can click it, but nothing happens. These errors can be seen for a small unpredictable periods of time by all the users simultaneously and after that all worked fine until next such error.

So it seemed like some kind of cache-related error, but how an invalid page could be put in the cache?

After some investigation we recalled that ASP.Net is really smart thing. It doesn’t render things if a browser doesn’t support them. So, when, for example, Google asks for our page, it receives a “light version” page without things like __doPostBack implementation. And it’s really hard to do postback without.. hm.. postback.

But why our users see pages generated for Google?

It’s simple. Google… I like to say it IS Google, but actually it could be any of that users with that rarely used browsers which doesn’t support anything except GET thing. So, say, Google asked for page, and if it wasn’t in output cache, it was generated and put in the output cache. And all users seen that “light” page until it expired.

Yes, I mean it: one user can force partially nonworking site functionality for all the others!

Ok, next in our show: how to reproduce and ways to fix.

But before, to understand what’s going on, I highly recommend reading great post The EPiServer CMS Output Cache Explained by Joel Abrahamsson.

Reproducing

Create simple EPiServer site from template, add page which use __doPostBack, like this one:

Markup:

<form runat="server" id="form1">
  Generated: <asp:Label ID="lblGenerated" runat="server" /><br />
  Click at: <asp:Label ID="lblClickedTime" runat="server" /> <asp:LinkButton ID="btnClick" runat="server" Text="Button" onclick="btnClick_Click" /><br />
</form>

Code behind:

protected void Page_Load(object sender, EventArgs e)
{
    lblGenerated.Text = DateTime.Now.ToString();
}
 
protected void btnClick_Click(object sender, EventArgs e)
{
    lblClickedTime.Text = DateTime.Now.ToString();
}

Turn on caching – episerver.config, sites/site/sitesettings node:

<siteSettings 
    ...
    httpCacheExpiration="1:0:0" 
    httpCacheability="Public" 
    httpCacheVaryByCustom="path" 
    httpCacheVaryByParams="id,epslanguage" 
    ...
/>

Run the site, create the page, and open it in new tab (cache not working in Preview). Log off (page caches only for anonymous). Ensure page is cached – click Refresh and see that page generated time is not changing. Ensure LinkButton is working – time is updating on click.

Install User Agent Switcher add-on https://addons.mozilla.org/en-us/firefox/addon/user-agent-switcher/ for Firefox. In Firefox, select Tools – Default User Agent – Search Robots – GoogleBot.

Restart web site. We’re ready.

Open the page using “Googlebot” in the Firefox (you are “Google” now). Try to click button and see it not working.

Open the same page in another browser, say, IE (now you are “normal user”) – and see that buttons not working! You’ve been sent a raw page generated for Google! Weird.

You can wait until cache expired and reload the page in IE – it will work then.

Q. E. D.

How to fix

There are several ways of avoiding such behavior.

  • Disable output cache.

It will work for small sites. But not in real world.

  • Use native ASP.Net httpCacheVaryByCustom="browser" setting.

There are several side-effects. First of all, a page will be cached separately for each browser and its major version combination, which will dramatically increase memory used by server output cache. Another thing is that EPiServer may believe in “path” setting here to vary cache by Url.Path.

  • Provide default capabilities for all the browsers in the App_Browsers\*.browser file, something like this one:
<browsers>
    <browser refID="Default">
        <capabilities>
          <capability name="browser"                         value="Firefox" />
          <capability name="layoutEngine"                    value="Gecko" />
          <capability name="layoutEngineVersion"             value="${layoutVersion}" />
          <capability name="ecmascriptversion"               value="3.0" />
          <capability name="javascript"                      value="true" />
          <capability name="javascriptversion"               value="1.5" />
          <capability name="w3cdomversion"                   value="1.0" />
          <capability name="supportsAccesskeyAttribute"      value="true" />
          <capability name="tagwriter"                       value="System.Web.UI.HtmlTextWriter" />
          <capability name="cookies"                         value="true" />
          <capability name="frames"                          value="true" />
          <capability name="javaapplets"                     value="true" />
          <capability name="supportsCallback"                value="true" />
          <capability name="supportsDivNoWrap"               value="false" />
          <capability name="supportsFileUpload"              value="true" />
          <capability name="supportsMultilineTextBoxDisplay" value="true" />
          <capability name="supportsXmlHttp"                 value="true" />
          <capability name="tables"                          value="true" />
          <capability name="javascriptversion"               value="1.8" />
          <capability name="supportsMaintainScrollPositionOnPostback" value="true" />
        </capabilities>
    </browser>
</browsers>

This will force ASP.Net to render the same HTML for all the browsers.

 

Do you know better solution?

Mar 13, 2011

Comments

Mar 16, 2011 09:29 AM

A very good blog post. Thank you for tracking this problem down. The whole .browser "feature" of ASP.Net is broken.

Oct 12, 2011 11:29 AM

Regarding the default value of httpCacheVaryByCustom being "path", the reason for that, as I understand it, is that in some scenarios the same page may exist in different paths (eg http://www.example.com/ and http://www.example.com/en/) and if there are relative paths in the output, one of the pages would have broken paths if they would share output caches.

Which means that depending on how the site is set up and coded you may need "path".
In that case it would be possible to create your own custom VaryByCustom handling that combines the browser and the path behavior.

Sticking something along these lines in global.asax.cs (inheriting from EPiServer.Global) + changing the config accordingly ought to work in that scenario (not tested):

public override string GetVaryByCustomString(HttpContext context, string arg)
{
if (arg == "browserandpath")
{
return base.GetVaryByCustomString(context, "browser") +"|"+ base.GetVaryByCustomString(context, "path");
}
return base.GetVaryByCustomString(context, arg);
}

See http://msdn.microsoft.com/en-us/library/5ecf4420.aspx for details.

Please login to comment.
Latest blogs
Optimizely and the never-ending story of the missing globe!

I've worked with Optimizely CMS for 14 years, and there are two things I'm obsessed with: Link validation and the globe that keeps disappearing on...

Tomas Hensrud Gulla | Apr 18, 2024 | Syndicated blog

Visitor Groups Usage Report For Optimizely CMS 12

This add-on offers detailed information on how visitor groups are used and how effective they are within Optimizely CMS. Editors can monitor and...

Adnan Zameer | Apr 18, 2024 | Syndicated blog

Azure AI Language – Abstractive Summarisation in Optimizely CMS

In this article, I show how the abstraction summarisation feature provided by the Azure AI Language platform, can be used within Optimizely CMS to...

Anil Patel | Apr 18, 2024 | Syndicated blog

Fix your Search & Navigation (Find) indexing job, please

Once upon a time, a colleague asked me to look into a customer database with weird spikes in database log usage. (You might start to wonder why I a...

Quan Mai | Apr 17, 2024 | Syndicated blog

The A/A Test: What You Need to Know

Sure, we all know what an A/B test can do. But what is an A/A test? How is it different? With an A/B test, we know that we can take a webpage (our...

Lindsey Rogers | Apr 15, 2024

.Net Core Timezone ID's Windows vs Linux

Hey all, First post here and I would like to talk about Timezone ID's and How Windows and Linux systems use different IDs. We currently run a .NET...

sheider | Apr 15, 2024