Getting full page content for RSS from a page with Composer

Vote:
 

I need to be able to output the full content of a page for an RSS feed however the page uses composer so it is not as simple as just calling the main body property.

So far I can pull the full html for the site using

StreamReader reader = new StreamReader(System.Net.WebRequest.Create(friendlyUrl).GetResponse().GetResponseStream());
string html = reader.ReadToEnd();

But that pulls the entire html. I want only the main body content - stripping out navigation etc. So essentially I need to pull out all content within say <div id="content">...</div>

I have noticed there is an html parser in EPiServer EPiServer.Framework.HtmlStreamReader but I'm not sure how to use it to extract all the html within a given div.

 

#62111
Oct 15, 2012 4:56
Vote:
 

You could add two HTML comments and select what's in between with a Regex:
Regex.Match(html, "<!-- start -->(.*?)<!-- /stop -->", RegexOptions.Singleline);

Or pass a parameter with your Web Request and hide stuff you don't want.

#62118
Edited, Oct 15, 2012 9:54
Vote:
 

Thanks for the suggestions. The Regex is the way I have managed to achieve it but I thought the parser may provide a more elegant solution. Unfortunately the rss feed will break if another developer decides to remove those specific comments from the html in the future so I will need to put some warning in there.

#62154
Oct 16, 2012 5:13
Vote:
 

Is there a way to get the content out of a composer block without having to scrape each page as this way is quite slow.

I am using FindPagesWithCriteria and need to find a particular block on the page to get the content from.

Jon

#151539
Jul 27, 2016 14:07
This topic was created over six months ago and has been resolved. If you have a similar question, please create a new topic and refer to this one.
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.