Been following the advice in the following post:
I have installed the EPiServer Search nuget package over an empty webapp built and ran locally.
I have updated my local episerver cms site to use this search service and ran the reindex through the /episerver/cms/admin/indexingcontent/indexingcontent.aspx.
All works ok. I can search for content in the CMS in the pages, blocks and media search boxes.
When I deploy this standalone indexing service to Azure and point my local CMS site to use this and run the reindex as before.
It appears to index correctly, it is creating the index in an AppData folder under the site root.
I can search for content in the CMS still in the pages and blocks search boxes, but I cannot search for media ?
Any ideas why the reindex would index the media correctly when running the indexing service locally, but not when running it in Azure ?
I have completely deleted the index folder locally and up in Azure and still the same result, cannot search for media when using Azure indexing service.
Incidently, I just discovered searching media by content IDs does find the content.
Strange why the page and block searches work fine with text.
So if anyone knows what the causes and fixes are for search working with content ids but not with text I'd be happy to hear them.
Some posts suggest the index is corrupt, and to reindex... but I have reindexed a few times now to no avail.
I now saw that you did just like that.
The only thing I can think of is that the indexing service has problem reading azure blob storage or something like that.
If you download the index file to your computer and inspect them with a tool like Luke (https://code.google.com/p/luke/), can you see how if you can see info about the media and what info that are in the index?
Basic setup here:
I downloaded the index files as you suggested and opened it with Luke and also opened the locally built index.
No changes have been made to CMS so indexes should in theory be exactly the same.
But what I found is:
Number of Documents : 5754 (Local) 5521 (Azure) so about 230+ documents are missing from the Azure index it seems.
Otherwise I don't know what else I can check with that tool... I don't know how to use it to build the same queries that EPiServer.Search would use.
I can't believe the problem could be at the local CMS site side... all I change is the baseUri for the namedIndexingService.
So I guess the indexingservice in azure must be having some trouble...
Will have to investigate and see if I can find out more from any log files etc.. from Azure
P.S. I copied up the locally built index to Azure and then the media searches work as expected.
So I think I managed to query the local index in LUKE for MediaData TYPE documents and I find 233, which is exactly the number of documents missing from the Azure index.
I was unsure whether it was just the MediaData files failing in the index or maybe other content types aswell and I just didn't realise it. but its pointing to just the MediaData content being missing from the index.
Is there any reason why the Azure index would be failing to index the Media data ?
Sorry, has been busy.
What I know of there should not be any problem with this.
What blob provider do you use in your site?
I'm only running a Local CMS site, and just changing the indexing url from a local one, which works perfectly, to one in Azure.
This CMS site uses the framework AppData path to store the media blobs etc...
The Search Service builds the index just in ~/AppData folder.
I've tried adding log4net logging in Application_Error handler in the global.asax on the Search Service.
The log file is created in ~/AppData folder but nothing is ever logged.
As for the Azure logs, can't find anything meaningful.
It appears it is getting some internal server errors, but I can't find anything meaningful in the logs.
eventlog.xml in Azure just has some stuff about powershell, I think from the scm portal for the site.
The DetailedErrors just contains the standard 500 error pages, "Most likely causes" and "things to try" etc... even though I have custom errors off.
So hitting a brick wall with it.
On one hand, it looks like something is causing an exception outside of the app itself or surely I would catch the exceptions in my global.asax handler.
On the other hand, it still carrys on and indexes all the content except the mediadata.
If you are running the site in Azure, you must use Azure blob storage or SQL blob storage as the blob provider
<add name="azureblobs" type="EPiServer.Azure.Blobs.AzureBlobProvider,EPiServer.Azure"
The only thing I have in Azure is the indexing service hosted in the empty web app.
And it does index the other content in the ~/AppData folder on the site.
Isn't that config above for the EPiServer CMS site ? where it stores the media blobs.
I'm still running the CMS site locally and its storing the blobs in the framework appdata path.
Ok, that is an interesting setup I have never tried :)
What other data do you have in /AppData that its manage to index?
If you run Fiddler when doing a reindex, can you see if there are differences between the request to the indexing service for the pages and for the media?
Well this is just playing about and testing the hosting of the indexing service in Azure. It shouldn't really matter where my CMS site is.
As I said, I used LUKE to open both indexes and found that the Azure one was missing 233 documents.
Using LUKE on the good index, I found that it held 233 MediaData documents.
I can search for pages and blocks no problem.
Running fiddler I don't know how I can tell which requests are for the media and which are for the pages (Short of examining the payload on each :( )... about 11k-12k Indexing Requests getting logged when I do a ReIndex.
Then it churns through them 100 at a time :(
So I decided to start with a fresh Alloy MVC site. (Episerver v 9.2) Pointed it at the Azure Index and ReIndexed...
With Fiddler I saw:
1st request to azure index, /indexingservice/indexingservice.svc/reset?namedIndex=&accesskey=myaccesskey failed with internal server error
It then made 4 further /indexingservice/indexingservice.svc/update?accesskey=myaccesskey posting the xml feed.
All of which returned status 200
In the first update call feed I did find some media documents... tried search for those in the CMS and it failed to find any.
Downloaded that index and viewed in LUKE and again it contained no MediaData.
The first request that failes looks like the one where it should empty/reset the index.
So it seems more that it does not send any at all.
I will try this out when I get my Azure account to work, right now it is not working for me
I finally managed to get the IndexingService logging working and finally can see whats happening.
Firstly the first call that was failing for the /reset may be because I was manually deleting the entire index before reindexing.
Second it looks like the Media indexing tries to make some callbacks to read the actual blobs from disk and of course they aren't accessible to the external indexing service.
Callback for data uri 'file:///D:/VSO/DefCol/POC/Danny/EPiServerSite1/EPiServerSite1/App_Data/blobs/615ecc068478468fb2512b04b029f7f6/78e08f306fc44c03888da8bab14fee3e.png' enqueued
File for uri 'file:///D:/VSO/DefCol/POC/Danny/EPiServerSite1/EPiServerSite1/App_Data/blobs/615ecc068478468fb2512b04b029f7f6/78e08f306fc44c03888da8bab14fee3e.png' does not exist
An exception was thrown when task was invoked by TaskQueue: 'indexing service data uri callback'. The message was: Object reference not set to an instance of an object.. Stacktrace was: at EPiServer.Search.IndexingService.IndexingServiceSettings.SetResponseHeaderStatusCode(Int32 statusCode)
Any ideas why it might need to read the blobs ? cache some thumbnail or something ?
Unless we can prevent the callback, I guess this will never work.