We have our own indexer that index all pages in our site, we get all pages using FindPagesWithCriteria and use multiple criterias to speed up the indexing.We have noticed that pages that has an unpublished parent is the result, we don't want pages that has an unpublished parent.Is there anyway to remove the pages that has an unpublished parent anywhere up in the pagetree?
I don't think so, not in the query. You have to post-filter them somehow.
But if you start crawling up the page tree for every page you will probably pull most pages on the site into the cache (=cache posioning), not good if you do this crawl often.
An approach is maybe to find all unpublished pages, get their Decendant pages (DataFactory.GetDescendents - will not pull pages into cache) and intersec this with your current result set to find the pages you should exclude.
To bad there isn't a way for this.
Any nice tip to convert DataFactory.Instance.GetDescendents to a PageDataCollection?
Well, PageDataCollection has a constructor you can use with an IEnumerable which you can get by projecting the collection you get, something like
// descendents contains result from GetDescendents
PageDataCollection pages = new PageDataCollection(descendents.Select(DataFactory.GetPage));
But the point is, in this case, you shouldn't do that unless you have to, because it will pull all the pages into cache for no reason. Instead just compare the PageReferences from GetDescendents with the PageReferences you have in your collection of PageDatas (through PageData.PageLink).
Thank you for the help Magnus.
Got it working now.