SaaS CMS has officially launched! Learn more now.

Accented characters in URLS not being replaced



Accented chars in URLS not being replaced

We have found an issue where certain characters that should form part of the URL.

For example, we have the following page title that also forms the URL: (this is Czech)

Náměty pro cvičení

Episerver correctly replaces the "á" with a standard "a".
However, the "ě" and the "č" are just being removed from the URL.

The URL is therefore: /Namty-pro-cvieni/ when it should be /namety-pro-cviceni/

Is there a "mapping" file somewhere which allows you specify replacements characters?


Mar 02, 2011 16:20

You can change this behaivor by attaching your self to  this event

UrlSegment.CreatingUrlSegment += new EventHandler<UrlSegmentEventArgs>(UrlSegment_CreatingUrlSegment);

    Then you can change the replace string, or as this example translate to english when characters in the url segment is only -

       static void UrlSegment_CreatingUrlSegment(object sender, UrlSegmentEventArgs e)
            string uRLSegment = e.PageData.URLSegment;
            if (string.IsNullOrEmpty(uRLSegment))
                uRLSegment = e.PageData.PageName;
                if (uRLSegment == null)
                    uRLSegment = "";
            string urlFriendlySegment = ReplaceIllegalChars(uRLSegment);
            if (string.IsNullOrEmpty(urlFriendlySegment) || urlFriendlySegment.Replace("-","")=="")
                urlFriendlySegment=TranslateUsingGoogle(e.PageData.PageName, e.PageData.LanguageID, "en");
            e.PageData.URLSegment = urlFriendlySegment;
        internal static string ReplaceIllegalChars(string inputString)
            string InvalidSegmentNames = @"%|^COM[0-9]([/\.]|$)|^LPT[0-9]([/\.]|$)|^PRN([/\.]|$)|^CLOCK\$([/\.]|$)|^AUX([/\.]|$)|^NUL([/\.]|$)|^CON([/\.]|$)";
            Regex  regexValidUrlChars = new Regex(@"^[A-Za-z0-9\-_~]+$", RegexOptions.Compiled);
            Regex regexFindInvalidUrlChars = new Regex(@"[^A-Za-z0-9\-_~]{1}", RegexOptions.Compiled);
            Regex regexInvalidSegmentNames = new Regex(InvalidSegmentNames, RegexOptions.Compiled | RegexOptions.IgnoreCase);

            StringBuilder builder = new StringBuilder(inputString);
            MatchCollection matchs = regexFindInvalidUrlChars.Matches(inputString);
            for (int i = 0; i < matchs.Count; i++)
                object obj2 = UrlSegment.GetURLCharacterMap()[builder[matchs[i].Index]];
                if (obj2 != null)
                    builder[matchs[i].Index] = (char)obj2;
                    builder[matchs[i].Index] = '?';
            builder.Replace("?", "");
            return builder.ToString();


public static string TranslateUsingGoogle(string text, string fromLang, string toLang)
            if (fromLang == null)
                fromLang = "auto";
            if (toLang == null)
                toLang = "en";

            string address = string.Format("{0}&langpair={1}", text, fromLang + "|" + toLang);
            string str7 = new WebClient().DownloadString(address);
            str7 = str7.Substring(str7.IndexOf("id=result_box"), 500);
            str7 = str7.Substring(str7.IndexOf(">"));
            str7 = str7.Substring(0, str7.IndexOf("</div"));
            Regex removeSpan=new Regex("<[^>]*>");
            return str7;


Edited, Mar 02, 2011 20:37

Thanks for the reply, but it doesn’t recognise the characters that are causing my issues, to replaces them with nothing, as in the build in functionality.

Is it possible to add items to the GetURLCharacterMap() collection?

Mar 03, 2011 9:33

you have to make your own CreatingUrlSegment, and for instance replace ě" and the "č to e and c before you call the base method

Mar 03, 2011 9:35
This thread is locked and should be used for reference only. Please use the Episerver CMS 7 and earlier versions forum to open new discussions.
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.