PageSearch performance and accuracy (many questions)

After experimenting a bit with the PageSearch web control, I've found out that any constraints (PropertyCriterias) seem to be carried out as a post DB operation. In this case, this is kind of severe from a performance perspective. We want to limit our search to pages of one specific page type, but still use the search query function. We have many thousands pages and if you set the MaxAllowHits parameter to 100 you will end up with a net result of about 2 hits for a common search keyword: Example: search for "stock" generates many hits. I've set the search query and added a required property criteria (PageTypeID=X). What happens is that MaxAllowHits are fetched from the database (thus missing out on MANY hits), then 99% is taken away since they are not of page type X. Result: few hits = not good. Question: are the top ranked one returned at least? raise the MaxAllowHits parameter then you say. Yes, feasible, but then performance degrades. Isn't there someway to carry out constraints already in the SQL query executed by EPiServer?? What does it actually mean to set a property to searchable in admin mode? Index Server kicks in? SQL indexes are generated? From a EPiServer SQL function? Third question: using automatic paging in the PageSearch control, I see that ViewState isn't approaching really high levels. Does that mean that SQL is carried out to fetch say posts 21-40 for instance on search page changes. If so, is that accomplished by using a temporary table in the SQL? Any ways to change this behavior? Or is the page data collection cached for some time in EPiServer rather than written to view state? If so, for how long? Regards
Sep 13, 2006 9:43
-- You're right this(post DB filtering) is by design in EPi. When it comes to large amount of pages and you in some way need to find pages from the whole page tree structure(or a large portion of it) my experience is that custom stored procedures is needed in the database. -- Because of this, database structure is designed for flexibility not performance, the different collections of pages fetched in the system is cached in the HttpContext.Cache object on the server. How long the Cache will live is depending on many things, memory, use frequency and on thing you can edit in web.config appsettings and key EPnPageCacheTimeout. This defaults to 12 hours. The Cache is cleared when new pages are published in some way or another. Hope this will guide you. BTW take a look att Steve's article about using the Cache, published in the most resent technical news letter. /HAXEN
Sep 14, 2006 8:29
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.