How to get file search to work for Episerver.Search.Cms v 10.0

Vote:
 

Hi,

I am having troubles getting Episerver.Search.CMS to work for file search. Page and content search work without problem. But I don't get any hits on MediaData.

I have it setup in docker as suggested here: https://github.com/episerver/content-search-lucene

I can see the following error in the docker logs when I index the site:

2023-04-21 14:39:07 fail: EPiServer.Search.IndexingService.Helpers.ResponseExceptionHelper[0]
2023-04-21 14:39:07       File for uri 'file:///C:/_Development/<repo>/src/<project>.Web/App_Data/blobs/809120097a1544ecac9d93f3b448c870/5385b76930d64f7ab8dbcb6c0d219077.docx' does not exist
2023-04-21 14:39:07 fail: EPiServer.Search.IndexingService.TaskQueue[0]
2023-04-21 14:39:07       An exception was thrown when task was invoked by TaskQueue: 'indexing service data uri callback'. The message was: Exception of type 'EPiServer.Search.IndexingService.HttpResponseException' was thrown.. Stacktrace was:    at EPiServer.Search.IndexingService.Helpers.ResponseExceptionHelper.HandleServiceError(String errorMessage) in /src/EPiServer.Search.IndexingService/Helpers/ResponseExceptionHelper.cs:line 16
2023-04-21 14:39:07          at EPiServer.Search.IndexingService.Helpers.LuceneHelper.HandleDataUri(FeedItemModel item, NamedIndex namedIndex) in /src/EPiServer.Search.IndexingService/Helpers/LuceneHelper.cs:line 617
2023-04-21 14:39:07          at EPiServer.Search.IndexingService.Models.DataUriQueueItem.Do() in /src/EPiServer.Search.IndexingService/Models/DataUriQueueItem.cs:line 18
2023-04-21 14:39:07          at EPiServer.Search.IndexingService.TaskQueue.Timer_Elapsed(Object sender, ElapsedEventArgs e) in /src/EPiServer.Search.IndexingService/TaskQueue.cs:line 59

I am guessing it could be that the file is not accessible inside the docker container since it is referring to my computer host.

Anyone else have experience with setting up Epierserver.Search.CMS in Optimizely 12 and can give me some guidance?

#300499
Apr 21, 2023 13:33
Vote:
 

I had the same problem in EPiServer.Search 9.0.3 and it still occurs for me in EPiServer.Search.Cms 10.0.0.
I haven't been able to pin-point exact issue, it seems the indexing fails altogether when it doesn't have access to file blob,
but I was able to make a workaround with following configuration:

{
  "episerver.search": {
    "syndicationItemAttributeNameDataUri":"DataUriSkipThis"
  }
}

I've skipped all other keys for brevity. 

#300587
Apr 23, 2023 14:32
Vote:
 

If a piece of content implements IBinaryStorable (which all classes from MediaData does), then the indexing service expects to find a blob file. If it cannot access this file, then the content is not indexed. Not exactly fail-safe, but that's that.

It looks like your indexing service is searching for blob files on its own local file system, which is surely not where your blobs are stored.

If your blobs are stored on a shared folder (an SMB share), then you can mount this share in your Docker container (read-only access should be enough). I included a few lines about exactly that in a previous blog post.

In your indexing service configuration, you can then set the Path property of the FileBlobProvider to that mount point. And that should be it.

#300589
Apr 23, 2023 16:17
Vote:
 

Thanks for both your inputs!

I got it to sort of work with Karol Berezicki's configuration. But now something weird happens becuase I specified a SearchRoot for my Query.

In the end, the url for the indexservice is this:

http://localhost:8000/api/indexing/search?q=((EPISERVER_SEARCH_DEFAULT:(förslag*)) OR (EPISERVER_SEARCH_TITLE:("förslag\*"^5))) AND (EPISERVER_SEARCH_VIRTUALPATH:(43f936c9\-9b23\-4ea3\-97b2\-61c538ad07c9|cb21792e\-828f\-4ece\-bb78\-80128b40988b*)) AND (EPISERVER_SEARCH_TYPE:("EPiServer.Core.MediaData,EPiServer")) AND (EPISERVER_SEARCH_ACL:(U\:episerver) OR EPISERVER_SEARCH_ACL:(G\:EditorInChief) OR EPISERVER_SEARCH_ACL:(G\:WebEditors) OR EPISERVER_SEARCH_ACL:(G\:WebAdmins) OR EPISERVER_SEARCH_ACL:(G\:Everyone) OR EPISERVER_SEARCH_ACL:(G\:Authenticated) OR EPISERVER_SEARCH_ACL:(G\:CmsAdmins) OR EPISERVER_SEARCH_ACL:(G\:CmsEditors) OR EPISERVER_SEARCH_ACL:(G\:VisitorGroupAdmins))&namedIndexes=&offset=0&limit=50&accessKey=XXXX

and I identified that EPISERVER_SEARCH_VIRTUALPATH is where the problem is. The API call returns no responses. However, if I remove 

|cb21792e\-828f\-4ece\-bb78\-80128b40988b

as a virtual path and run the same call in Postman, I get a result. So I tried removing SearchRoot and then I actually get a response with the file.

The problem now is that I would really like to have the SearchRoot set since I have mulitple "Start Pages" on this site and I don't want to get a response from all of them.

Side note: This is only a problem with MediaData, not regular pages.

#300639
Edited, Apr 24, 2023 8:34
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.