Hi,
We are using SearchDataSource for searching pages and documents in one of our projects.
It works well with the default Documents directory configured in web.config (using VirtualPathVersioningProvider).
We implemented a custom Virtual Path Provider (doesn't support versioning) for storing files in a database and the problem is that Searching doesn't work for the files in the database. Is there a way EPiServer Indexing Service to index those files?
I couldn't find any information about this problem.
Thanks in advance
Lubomir Mosholov
ProPeople
Hi Lubomir!
EPiServer uses both an open source library (Lucene) and Microsoft Indexing Service to create the search index for files.
In EPiServer 4 was Microsoft Indexing Service responsible for building an index for ordinary files (i.e. the upload folder) and EPiServer Indexing Service was needed to get the versioned files in the documents folder indexed. The reason is that the path and filename was stored in the database and the content in a file with a guid as name.
In EPiServer CMS all files are stored using the VirtualPathVersioningProvider (that always stores the path and filename in the database and the content as a file with a guid as name). For this reason must the EPiServer Indexing Service be running and the web site configured to use it if you want to search files.
So how does the keywords get into the index? You can not just take the content of a binary word document or pdf-file. The binary file must be converted to text first and for this EPiServer relies on a part of Microsoft Indexing Service. Applications can register converters the implement a COM-interface (IFilter) and this is used by Microsoft Indexing Service, SharePoint, EPiServer or any application intrested in getting the text out of a binary document.
You can have a look on the imlementation with Lutz Roeder's .NET Reflector if you load the EPiServer.InexingService.exe and look at the class: EPiServer.IndexingService.Indexers.FileItemIndexer
So with this knowledge, is it possible to solve you problem?
No, I think it will be hard (at least with the current version) because a little more analysis reveals that EPiServer Indexing Service does not ask the VirtualPathProvider class for the content of the file instead it has hardcoded knowledge of the physical location used by the VirtualPathVersioningProvider (see EPiServer.IndexingService.ItemIndexerManager.CreateDocument).
Regards,
Fredrik Haglund,
INEXOR AB - http://blog.fredrikhaglund.se
I found a solution to my problem.
I am using the SQL Server's Full-text search functionality to create the search index. ( http://msdn2.microsoft.com/en-us/library/ms142571.aspx ). Note that index can be build on varbinary(max) or image columns.
I also extended the SearchDataSource control and overrided the PerformFileSearch method. In this method a Full-text search query is executed.
Then the Database virtual path provider uses the returned virtual paths to do its job.
Hope someone finds this solution useful.