Don't index large files, implement conventions for IContentMedia, e.g.
Add to your IndexingInitialization
private static bool ShouldIndexFile(IContentMedia file)
{
if (file == null)
return false;
try
{
using (var stream = file.BinaryData.OpenRead())
return stream.Length <= 52428800; // less than 50MB
}
catch (Exception e)
{
log.Error(String.Format("Unable to determine if the file {0} (ID: {1}) should get index.", file.Name, file.ContentLink.ID), e);
return false;
}
}
And
public void Initialize(InitializationEngine context)
{
// override defaults if you like
ContentIndexer.Instance.ContentBatchSize = 20; // default is 100
ContentIndexer.Instance.MediaBatchSize = 3; // default is 5
// remove media types that should not be indexed
ContentIndexer.Instance.Conventions.ForInstancesOf<GenericMedia>().ShouldIndex(ShouldIndexFile); // implement any custom MediaData implementations etc
ContentIndexer.Instance.Conventions.ForInstancesOf<MediaData>().ShouldIndex(ShouldIndexFile); // filter the base type
// ...
}
You may find some additional advice in this blog post by Ben Nitti, https://world.optimizely.com/blogs/repo-journal/dates/2021/4/optimizing-your-asset-indexing-with-conventions/#:~:text=Episerver%20recommends%20not%20exceeding%20the%20by%20default%2050%20MB%20maximum%20request%20size.
Hi Eric,
Thanks, this works like a charm. However, I still think that HTTP 419 errors should not trigger a retry automatically since for a file which is over 50 MB, it is guaranteed to get the same error for consecutive retries.
Hi Eric,
Thanks, this works like a charm. However, I still think that HTTP 419 errors should not trigger a retry automatically since for a file which is over 50 MB, it is guaranteed to get the same error for consecutive retries.
There are several other settings you can add to fine-tune the search engine.
Try these to get started (update the values to your preference)
ContentIndexer.Instance.MaxTries = 1;
ContentIndexer.Instance.MaxWaitTime = 10;
Hello,
I've noticed that having too many large files increase the duration of the indexing job drastically since large files (presumably those which are over 50 MB?) are causing an exception which in turn causes a retry. The exception which is thrown is ServiceException: The remote server returned an error: (413) Payload Too Large. Request entity too large.
Having this particular error, the job should not retry indexing the file 2 more times since it's certain that the subsequent requests will throw the same error. Below is an example of such case. The waiting period between retries adds an extra 20 seconds per file to the indexing job.