Join us this Friday for AI in Action at the Virtual Happy Hour! This free virtual event is open to all—enroll now on Academy and don’t miss out.

 

Getting full page content for RSS from a page with Composer

Vote:
 

I need to be able to output the full content of a page for an RSS feed however the page uses composer so it is not as simple as just calling the main body property.

So far I can pull the full html for the site using

StreamReader reader = new StreamReader(System.Net.WebRequest.Create(friendlyUrl).GetResponse().GetResponseStream());
string html = reader.ReadToEnd();

But that pulls the entire html. I want only the main body content - stripping out navigation etc. So essentially I need to pull out all content within say <div id="content">...</div>

I have noticed there is an html parser in EPiServer EPiServer.Framework.HtmlStreamReader but I'm not sure how to use it to extract all the html within a given div.

 

#62111
Oct 15, 2012 4:56
Vote:
 

You could add two HTML comments and select what's in between with a Regex:
Regex.Match(html, "<!-- start -->(.*?)<!-- /stop -->", RegexOptions.Singleline);

Or pass a parameter with your Web Request and hide stuff you don't want.

#62118
Edited, Oct 15, 2012 9:54
Vote:
 

Thanks for the suggestions. The Regex is the way I have managed to achieve it but I thought the parser may provide a more elegant solution. Unfortunately the rss feed will break if another developer decides to remove those specific comments from the html in the future so I will need to put some warning in there.

#62154
Oct 16, 2012 5:13
Vote:
 

Is there a way to get the content out of a composer block without having to scrape each page as this way is quite slow.

I am using FindPagesWithCriteria and need to find a particular block on the page to get the content from.

Jon

#151539
Jul 27, 2016 14:07
This topic was created over six months ago and has been resolved. If you have a similar question, please create a new topic and refer to this one.
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.