Maby you could find some inspiration from Teds blog about EPiServer Search
http://tedgustaf.com/blog/2013/4/add-custom-fields-to-the-episerver-search-index-with-episerver-7/
Thanks Henrik! That pointed me to the right direction.
It turned out that method EPiServer.Search.IndexingService.IndexingServiceHandler.HandleDataUri was the main culprit. It receives a parameter DataUri, that's a path to the file content, and handles reading the file and populating the item. All this is veeeery slow with big files. The method is executed before event IndexingService.DocumentAdding is called, so events do little good here.
Most easiest way to solve this seemed to be overriding EPiServer.Search.SearchHandler not to pass the DataUri parameter at all, and then configuring StructureMap to use own implementation instead of the default.
namespace Solita.Web.Utils.Performance { public class NonFileIndexingSearchHandler : SearchHandler { public override void UpdateIndex(IndexRequestItem item) { UpdateIndex(item, null); } public override void UpdateIndex(IndexRequestItem item, string namedIndexingService) { // never index data content. it's too slow item.DataUri = null; base.UpdateIndex(item, namedIndexingService); } } }
Hi
For some godforsaken reason Episerver.Search indexes xml and csv files for full text search. When xml files are large this takes a lot of CPU on the server, and I think it's just ridiculous to assume that editors would wish to search xml file content.
Documentation for Episerver.Search is a joke. Is there an easy way to indexing of xml and csv files by extension or by file size?