I have a site that hosts multiple blogs and I'm trying to improve the code that retrieves them because as we are now getting more blog posts, it's proving to be very inefficient. The goal is to retrieve the most recent N posts where depending on the page and configuration, N would probably be in the 3-30 range.
What it does now is get a collection of ContentReferences to all blog posts on the entire site. This will be thousands upon thousands of results and always growing. Then it filters these down to posts that are ancestors of a given parent ContentReference...in otherwords, it filters it down to the specific, single blog we want posts for. At that point it loads the actual BlogPost PageData for all posts belonging to the blog so that it can perform filtering and sorting on StopPublish and StartPublish respectively.
I'm not sure how big of a hit to the server the first step of querying all blog posts and then filtering it is, but it's a lot of work that doesn't need to be done. And then loading the actual content item for all posts so that we can filter and sort by date is a very heavy task. For active bloggers, it might load 1000 posts just to display the most recent 3 and do that every time the page loads.
To me, it would make sense to first, start with the ContentReference of the blog we're interested in and only look for posts that are children of it. But that still leaves me with the problem of needing to load all of its posts in order to filter and sort by date. How can I more efficiently get at the most recent, unexpired N posts? Can it be done with or without EPiServer Find (something I'm just now learning of)?
Use GetChildren like you are hinting at. That's a cached method so performance for the actual call is not huge. Sort by date using linq and cache the result and reuse it in further request to avoid having to take the hit for sorting all over again. First user will take a small hit but shouldn't be too bad. You can use a scheduled job to trigger updating the cache if you wish to avoid that. I do that sometimes for really slow methods to always have a cached result ready. Make sure you don't transform those 1000s of pages to another dto or something before you filter them to avoid burning some unnecessary ram memory.
Episerver Find will of course make that sorting etc even quicker but it's a bit overkill to buy for just this. That's my 2 cents...
Thanks Daniel! My worry with that is memory usage because it would keep thousands of blog post pages cached that won't ever actually be needed, won't it? My understanding is GetChildren<BlogPost>() will retrieve and cache all BlogPosts, even those from 10 years ago (if the site had been around that long). As multiple blogs are visited, the blog posts cached will approach all posts ever. Can that really be considered negligible? Ideally we would only retrieve and cache the set that's needed. We do have the post pages categorized under year/month/day folders so I guess I could write something to step back in time through folders until the number required has been retrieved.
As long as you don't transform them to another object before you have filtered then I think you'll be fine.
Measure it and see if it's a problem :)
Another way is to store the actual content references on some ancestor for the latest X posts. You can easily do that in the published event.
I would recommend you to use search (find, standard search, or custom one) to fetch the latest blog posts.
If that's not an option, you can check out the FindPagesWithCriteria method.
Some time ago I blogged about custom search engine based on lucene.net
There's an article on how to get the latest 5 blogposts: http://dcaric.com/blog/extending-episerver-search-part-2
It doesn't have all the features you need, but the code is available on github so you can easily modify it :)