Optimizing your Optimizely Search & Navigation service for large files
Awhile ago I had a client with an excess of large files. I had increased their upload size limit to 2 GB and many of their documents were between 50 MB; a dozen or so files were between 1 and 2 GB. Episerver recommends not exceeding the by default 50 MB maximum request size.
Not surprisingly the indexing job started timing out and required immediate attention.
I found there were several ways to tweak the the performance by filtering these files from the indexing job.
I created an initialization module and changed the default batch sizes for the Find service. ContentBatchSize is used for the find index job, MediaBatchSize is for the event-driven indexing on media types.
[InitializableModule]
[ModuleDependency(typeof(IndexingModule))]
public class FileIndexingConventions : IInitializableModule
{
public void Initialize(InitializationEngine context)
{
ContentIndexer.Instance.MediaBatchSize = 3; // Default is 5
ContentIndexer.Instance.ContentBatchSize = 50; // Default is 100
}
public void Uninitialize(InitializationEngine context)
{
throw new NotImplementedException();
}
}
I had several ways to filter out these large files. I could filter out IContentMedia from the index entirely or do the same with a custom type for pdfs and zip extensions.
ContentIndexer.Instance.Conventions.ForInstancesOf<MyPdfMediaType>().ShouldIndex(x => false);
Alternatively, I could stop the binary data from being indexed by decorating the propery with the [JsonIgnore] attribute:
public class MyPdfMediaType : MediaData
{
[JsonIgnore]
public override Blob BinaryData { get; set; }
}
But since the client wanted to have the file content searchable, I decided only to filter the property when the filesize reached the find service limit.
ContentIndexer.Instance.Conventions.ForInstancesOf<IContentMedia>().IndexAttachment(x => !IsFileSizeLimitReached(x));
...and for this I used an extention method to check against filesize binary data:
private static bool IsFileSizeLimitReached(IBinaryStorable binaryContent)
{
// Note: 37 MB max. size refers to the base64 encoded file size .
const int limitKb = 37000;
try
{
var blobByte = (binaryContent.BinaryData as AzureBlob)?.ReadAllBytes() ??
(binaryContent.BinaryData as FileBlob)?.ReadAllBytes();
if (blobByte == null)
return false;
double fileSize = blobByte.Length;
var isLimitReached = (int)(fileSize / 1024) >= limitKb;
return isLimitReached;
}
catch
{
return false;
}
}
Once in place I was able to run the job with no exceptions, no timeouts and a happy client!
Comments