Using FindPagesWithCriteria multiple times

Vote:

Hi folks

We're about to build a new home page that will pull in various pages from all over our website. In particular, we want to get the latest x number of posts from each of our various blogs (a custom blog - not the off-the-shelf EpiServer version). As the blogs are scattered over the site, I'm imagining that I'll need to use FindPagesWithCriteria starting at the site root to find the blogs, and then for each blog to use FindPagesWithCriteria starting at each blog root to find the associated posts, which I will then need to sort, returning the number specified (which is likely to be only one or two posts for each blog).

Of course, if I only wanted to get the latest blog posts overall, I could simply search for blog posts starting at the site root; but because I need to find the latest posts from each blog, I assume this means I'll need to use FindPagesWithCriteria multiple times as described above.

From previous experience, using FindPagesWithCriteria multiple times on the one page has a serious impact on performance (even though we cache all calls to FindPagesWithCriteria). Does anyone know if there's a better way to go about retrieving multiple page types from multiple places? The above is just a small example of what we'd like to do - it is likely that we'll want to try and retrieve a bunch of other pages of various criteria on the same page too.

Any help would be greatly appreciated.

#43641

Sep 21, 2010 4:00

Magnus Rahl

Vote:

You could cache your results somehow. Cache the PageReferences, not the PageDatas (they're already cached and managed in the cache: cleared when updated etc). You could cache the list of blogs using a cache key which you clear when a new page of that type is published (from a global event handler for the PagePublished event).

Then you can do the same for each blog's blog posts (vary the cache key by the blog's page ID etc). Same thing here, set up a way to flush the cache when a new blog post is created (for best results you should find a way to clear only the affected blog's blog posts when a new post is created).

How does the structure of each blog look? If it is relatively flat (posts direct children of the blog, placed in a specific or a few containers or even some more levels), recursing the tree with GetChildren abd filtering for page type might be a better alternative than FindPages for getting the blog posts. The reason for this is that not only PageDatas are cached and cache managed, so are child page references. So the second time you recurse the tree chances are this will result in zero database queries.

For large trees FindPages are better because you probably don't want to pull the while page tree into the cache. With FindPages only the matching pages are construted from the database and cached.

#43651

Sep 21, 2010 7:59

Magnus Rahl

Vote:

Here's an interesting blog post with insights into FindPagesWithCriteria.

#43652

Sep 21, 2010 8:02

Vote:

Thanks for that Magnus. We were already using caching that someone set up many EpiServers ago, but your advice caused me to look at this a little more closely, and I've been able to improve that side of it somewhat. But it was the idea of looking at the PagePublished published event (and MovedPage event to capture deleted pages) that has enabled the cache to be automatically updated - while I haven't tried the new method with the blogs, I used it with another portal page I built earlier and the page load as improved from around seven seconds to almost instantaneous. Brilliant.

As for the blog, the structure is such that there's a page type of "blog" that represents the home page of the blog, and each month a new page type of "blog" is created under this as the holding page for the posts for that month. As a result, the blog structure is not quite as flat as a traditional blog, and so using GetChildren would require a bunch of calls (i.e. first to get the holding pages from the parent, and again to get the blog posts from the holding pages). Therefore, I reckon that using FindPagesWithCriteria is still the way to go, but as I only need read the latest three or so posts from each blog, I can simply store these in a PageDataCollection in the cache and will only need to update the cache when someone publishes a new post. This is clearly much faster than pulling back all of the blog posts, sorting them and taking the top three, and doing that for each blog every time someone views the home page.

Thanks again for your help. The blogging juggernaut can now roll on.

#43760

Sep 23, 2010 3:10

Magnus Rahl

Vote:

Sounds like you've got things under control, great! Just a few more tips:

Don't cache a PageDataCollection. Create a helper or extension method that can transform the collection to an array of PageReferences and another method that can do the opposite. Why? If a PageData is updated your cached collection will still hold a reference to the old object. By storing a PageReference and get the PageData from DataFactory you always get the latest version, either from cache or from the database if the page was changed since the last fetch (the absolute majority of times from cache, even if it was updated since last time you requested it). This is probably not a big problem in your situation, consider it more general advice.

About finding blogs, I can't say what would be more efficient in your situation. My guess is that if each blog has tens of month containers each containing tens of blog posts it's basically a tradeoff between database calls and memory usage. But just as an observation: If you're looking for the latest x post, can't you use the fact that those should be sorted into a certain month container? In many cases you would probably only have to include the last container's posts in your collection before sorting by date and picking the top ones? Even if the posts are scarce and you get more GetChildren calls and somewhat more constructed PageDatas, both those are really cheap after the child lists and pages have been cached on the first call.

As for the in-memory sorting, the problem with getting many PageDatas is not performance in the sorting, which will probably be really fast unless you have tens of thousands of objects, but that you pull a lot of "useless" PageDatas into the cache. But because of this hierarchial date structure you can avoid that.

I think I've long ago left the real-world issue here and gone into the academic field of performance. If FindPages is sufficiently fast for this scenario, use it. One should always write code carefully, but not be overly analytic about performance unless the solution actually proves to suffer from performance issues.

Good luck!

#43761

Sep 23, 2010 6:53