Getting full page content for RSS from a page with Composer

rachel.goldthorpe@karuba.co.nz

Vote:

I need to be able to output the full content of a page for an RSS feed however the page uses composer so it is not as simple as just calling the main body property.

So far I can pull the full html for the site using

StreamReader reader = new StreamReader(System.Net.WebRequest.Create(friendlyUrl).GetResponse().GetResponseStream());
string html = reader.ReadToEnd();

But that pulls the entire html. I want only the main body content - stripping out navigation etc. So essentially I need to pull out all content within say <div id="content">...</div>

I have noticed there is an html parser in EPiServer EPiServer.Framework.HtmlStreamReader but I'm not sure how to use it to extract all the html within a given div.

#62111

Oct 15, 2012 4:56

Johan Kronberg

Vote:

You could add two HTML comments and select what's in between with a Regex:
Regex.Match(html, "(.*?)", RegexOptions.Singleline);

Or pass a parameter with your Web Request and hide stuff you don't want.

#62118

Edited, Oct 15, 2012 9:54

rachel.goldthorpe@karuba.co.nz

Vote:

Thanks for the suggestions. The Regex is the way I have managed to achieve it but I thought the parser may provide a more elegant solution. Unfortunately the rss feed will break if another developer decides to remove those specific comments from the html in the future so I will need to put some warning in there.

#62154

Oct 16, 2012 5:13

Vote:

Is there a way to get the content out of a composer block without having to scrape each page as this way is quite slow.

I am using FindPagesWithCriteria and need to find a particular block on the page to get the content from.

Jon

#151539

Jul 27, 2016 14:07