I need to be able to output the full content of a page for an RSS feed however the page uses composer so it is not as simple as just calling the main body property.
So far I can pull the full html for the site using
StreamReader reader = new StreamReader(System.Net.WebRequest.Create(friendlyUrl).GetResponse().GetResponseStream()); string html = reader.ReadToEnd();
But that pulls the entire html. I want only the main body content - stripping out navigation etc. So essentially I need to pull out all content within say <div id="content">...</div>
I have noticed there is an html parser in EPiServer EPiServer.Framework.HtmlStreamReader but I'm not sure how to use it to extract all the html within a given div.
You could add two HTML comments and select what's in between with a Regex:Regex.Match(html, "<!-- start -->(.*?)<!-- /stop -->", RegexOptions.Singleline);
Or pass a parameter with your Web Request and hide stuff you don't want.
Thanks for the suggestions. The Regex is the way I have managed to achieve it but I thought the parser may provide a more elegant solution. Unfortunately the rss feed will break if another developer decides to remove those specific comments from the html in the future so I will need to put some warning in there.
Is there a way to get the content out of a composer block without having to scrape each page as this way is quite slow.
I am using FindPagesWithCriteria and need to find a particular block on the page to get the content from.