I am trying to load the contents of an XhtmlString property in a XDocument so I can filter/change certain HTML elements, like changing a script into a div with some data-attributes. I would rather do the filtering this way than use RegExps for it.
I noticed that the XhtmlString contains non-closed elements, ie:
Ofcourse I could preprocess the string before loading it into a XDocument for processing, but find that an ugly solution. I wonder if there is some way in Epi that I can make sure that the XhtmlString is indeed valid XML without the need to preprocess it?
:) welcome to "html is not xml hell". more detailed description about the case here.
long story short - don't do that. html is not xml. instead - you can use some other library to do the heavy lifting (like HtmlAgilityPack or similar).
Thanks for your suggestion. I am now trying out HtmlAgilityPack for this, and rewriting the Xhtml string contents seems to work quite well. I do see weird behaviour though when newing up a new XhtmlString with the modified XHTML as string; some element may just disappear after having loaded the string into a XhtmlString, like a empty <div> element? Any idea what could be causing this?
TinyMCE might sometimes try to "fix" the text's html.Are you using the old or new version of Tiny?
I am changing the XhtmlString property data in code; I assume TinyMCE only applies any changes when working through the GUI?
True, then sorry I do not have any idea
I just visited `HtmlStreamReader` type (which seems to be closely related to the question asked). I don't want to re-visit that type anymore anytime soon ;) So, sorry. I might need to digest code I saw and try to look for your answer later..
Html Agility pack did the trick for me in the end. It is quite easy to load its Document with the contents of an XhtmlString, replace/remove certain nodes, and then write the result HTML back into an XhtmlString.