I don't know if there's anything in particular that "typically" would cause this but these are the questions that come to mind:
What charset is the text actually sent in?
What charset does the browser say it is using?
What does the Content-Type header say?
Is there a <meta http-equiv="Content-Type" content="...."> element in the body of the response, if so, what does that say?
I hoped to have answers for this a long time ago, but I haven't seen the error since the start of october, but I still get reports of it happeing from users.
Here are atleast the answers for when the error doesn't show up:
What charset is the text actually sent in?
How do I discover this? I usually just look at what the content-Type of the response-header says...
What charset does the browser say it is using?
utf-8
What does the Content-Type header say?
Content-Type: text/html; charset=utf-8
Is there a <meta http-equiv="Content-Type" content="....">
Yes, it says: <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
Ok, so "everyone" is agreement on a metadata level that the response is encoded as utf-8... Which with the results you are getting probably means that it actually isn't.
I would guess that it's either win-1252 or plain iso-8859-1 (doesn't make a difference for the particular characters you mentioned) but to confirm you could either use for instance use Fiddler, switch to its "HexView" and check what the byte values are for those characters.
For example the capital Ø is d8 in win-1252 but c3 98 in utf-8. (Compare eg http://www.utf8-chartable.de/ and http://msdn.microsoft.com/en-us/goglobal/cc305145. Actually for these characters I believe you can just compare columns one and three in the former, ignoring the leading nul byte of the unicode code point.)
Also, have you changed any of the charset settings in the configuration/system.web/globalization element? (http://msdn.microsoft.com/en-us/library/39d1w2xf(v=vs.100).aspx)
This is a longshot and a wild guess with no real ideas on where to start looking, but... I share Håkans suspicion that the content isn't actaully written to the response stream in UTF8. I suppose the encoding used can be set somewhere in the Response object, it's streams or stream writer. Perhaps that is set by ASP.NET based on headers (accept-charset for example). If you have the output cache enabled you will cache rendered responses for a period of time, and once they have expired the next request will cause ASP.NET to re-render the page and then cache it. Then theoretically, if ASP.NET for certain requests selects a different encoding than UTF8 that "faulty" response would then be cached for that specific page for a period of time, explaining the behaviour you see.
In the hexviewer in fiddler the hexadecimal values corresponds to utf-8 enconding (Ø=>C3 98). The response is sent zipped and I had to click the notification that shows when opening the fiddler-hexviewer to see the response unzipped ("Response is encoded and may need to be decoded before inspection. Cleck here to transform") but i guess that can't change the character-encoding to utf-8?
Here is the globalization-element from web.config:
<globalization culture="nb-no" uiCulture="no" requestEncoding="utf-8" responseEncoding="utf-8" resourceProviderFactoryType="Episerver.Resources.XmlResourceProviderFactory, EpiServer" />
I dont think it has been changed in over a year, but then again i don't know for how long this problem has existed since it doesnt happen very often.
I do have output cacheing activated, so the cacheing of faulty responses from certain "unorthodox" request-headers sounds like a very likely explanation for the behaviour im seeing. I think ill try to deactivate cacheing and send a few requests with different headers and see if I can reproduce the error. Thanks for the help so far.
Did this response that you inspected (and which seemed to be utf-8 encoded) also show incorrectly in the browser?
no. I haven't seen a page show up incorrectly in over a month now. The last time I got feedback from users seeing this is two weeks ago. But I think it happens more often than that, since most users probably don't bother pointing this out.
If I do discover a page with faulty enconding, ill post info about headers and meta-attributes.
Hi,
I have a strange encoding problem which I don't know if is linked to Epi-server or not. Once in a while a page on my site will start displaying æ,ø,å as a ?-sign. This will persist for a few minutes and then everything is ok again. This is not a browser-problem, since when the encoding problem strikes it is visible from any client. And it is also not server-wide since any other page on the site is still correctly encoded. I have also seen this happen in edit-mode, where the property-labels got ?-signs inside a black diamond-shape.
Have anyone else experienced this or know what might be causeing it?
Best regards,
Frank W. Johannessen.