Modifying Crawled Content

Vote:
 

Hi, I'm working on Find 13.0.1, CMS 11.9.1. We have Find crawling an external site and the results are included in our site search. We have an issue where the search returns too many results for certain search terms as they form part of the crawled sites navigation and footer, so they're repeated on almost every page.

One way we are trying to resolve this is via a scheduled job that will iterate each of the crawled WebContent objects and strip out all the content we don't want.

There's some good documentation here https://world.episerver.com/documentation/developer-guides/find/NET-Client-API/Indexing/ which discusses updating the indexed objects if you have the items ID, the only ID I can find is the SearchHit.Id but I have a feeling this isn't what I'm supposed to be using as neither an Update or Delete works using it.

Below is what I'm using in the scheduled job.

I'm retrieving crawled content using SearchClient.Instance.Search() and iterating the hits (SearchHit)


If I try to update the indexed item I don't get an error but the index isn't modified.

ITypeUpdated updateResult = SearchClient
    .Instance
    .Update(hit.Id)
    .Field(x => x.SearchText, "New Clean Search Text");

Alternatively if I try to delete the item so I can replace it with a new 'cleaned' indexed item I get a "Not Found" error

DeleteResult deleteResult = SearchClient
    .Instance
    .Delete(DocumentId.Create(hit.Id));

Adding a new cleaned index item works OK but would only be a valid option if I can get the delete working.

IndexResult indexResult = SearchClient.Instance.Index(hit.Document);

Has anyone done this before, if so, where can I find the correct ID to pass? Or am I heading down the wrong path altogether.

Thanks,
Jason

#196700
Sep 09, 2018 7:25
Vote:
 

I'm working with Jason on this and we found we had missed .Execute() on the update call we were using e.g.

var updateResult = SearchClient
.Instance
.Update<WebContent>(hit.Id)
.Field(x => x.SearchText, hit.Document.SearchText)
.Execute();


However now we are getting the following exception.

EPiServer.Find.ServiceException: 'An exception occured when trying to update item. No item exists with id. - The remote server returned an error: (404) Not Found.

To call udpate we're first performing a search and then iterating through each item in the hits to update them. 

Does anyone know why we are getting a 404 for a document id we just retrieved?

Thanks

Kevin

#196705
Sep 10, 2018 6:36
Vote:
 

Hi,

I have similar problem when I try below condition it is able to find record

var getAllDocs = client.Search<Document>()
.Filter(x => x.ContentLink.ID.Match(e.Content.ContentLink.ID))
.GetContentResult<Document>().FirstOrDefault();

But when i try client.update it fails to search the ID, and no record is updated.

client.Update<Document>(e.Content.ContentLink.ID).Field(x=>((Document)x).myContentPropertyName,((Document)e.Content).myContentPropertyName).Execute();

Is there something that i am missing or is incorrect in my code ???

Thanks.

Regards,

Rajesh K

#198072
Oct 19, 2018 15:54
Vote:
 

We have the same issue here. Does anyone know how to fix this?

Thanks

#198616
Nov 01, 2018 8:36
Vote:
 

I'm also experiencing problems with the EpiFind Delete methods:

IEnumerable<DeleteResult> result;
ContentIndexer.Instance.TryDelete(page, out result);

Response:

{EPiServer.Find.Api.DeleteResult}[0]
Found: False
Id: "_8742719d-b14e-4ff6-b131-b3e98060c332_en-GB"
Index: "myindexname_devindex"
Ok: true
Type: "MyNameSpace_CategoryPage"

However, I can see the document in the Find index (/secui/find/#overview/explore):

"LanguageBranch$$string": "en-GB"
....
"ContentGuid": "8742719d-b14e-4ff6-b131-b3e98060c332"

Find 13.0.1, CMS 11.10.1

Is this a bug in Find 13.0.1?? Any help would be much appreciated.

Thanks,

Martin

#198675
Nov 02, 2018 12:04
Vote:
 

Episerver support has confirmed that there's a bug with TryDelete().

This is the bug: https://world.episerver.com/support/Bug-list/bug/FIND-4048

The workaround for now is to use the SearchClient instead.

 var localizable = content as ILocalizable;
 if (localizable != null)
 {
     SearchClient.Instance.Delete(content.GetType(), SearchClient.Instance.Conventions.IdConvention.GetId(content), localizable.LanguageRouting(), null);
 } 
else { SearchClient.Instance.Delete(content.GetType(), SearchClient.Instance.Conventions.IdConvention.GetId(content), null); }
#199169
Nov 19, 2018 9:20
Vote:
 

Applied below fixes which solved my issue...

a) added [IndexInContentAreas] to class.

b) The new property was already added with [CultureSpecific] attribute.

c) Run search indexing job. It should add this new value to search indexes. and so the below filter conditions will work...

d) filter = filter.And(q => !((Document)q).HideDocumentInSearchResults.Match(true)); //for us it means hide documents/files from search results who are set true 

Regards,

Rajesh K

#200044
Dec 27, 2018 12:50
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.