Views: 462
Number of votes: 1
Average rating:

Optimizing your Optimizely Search & Navigation service for large files

Awhile ago I had a client with an excess of large files. I had increased their upload size limit to 2 GB and many of their documents were between 50 MB; a dozen or so files were between 1 and 2 GB.  Episerver recommends not exceeding the  by default 50 MB maximum request size.

Not surprisingly the indexing job started timing out and required immediate attention. 

I found there were several ways to tweak the the performance by filtering these files from the indexing job. 

I created an initialization module and changed the default batch sizes for the Find service. ContentBatchSize is used for the find index job, MediaBatchSize is for the event-driven indexing on media types. 

    [InitializableModule]
    [ModuleDependency(typeof(IndexingModule))]
    public class FileIndexingConventions : IInitializableModule
    {
        public void Initialize(InitializationEngine context)
        {
            ContentIndexer.Instance.MediaBatchSize = 3;     // Default is 5
            ContentIndexer.Instance.ContentBatchSize = 50;  // Default is 100
        }

        public void Uninitialize(InitializationEngine context)
        {
            throw new NotImplementedException();
        }
    }

I had several ways to filter out these large files. I could filter out IContentMedia from the index entirely or do the same with a custom type for pdfs and zip extensions.

ContentIndexer.Instance.Conventions.ForInstancesOf<MyPdfMediaType>().ShouldIndex(x => false);

Alternatively, I could stop the binary data from being indexed by decorating the propery with the [JsonIgnore] attribute:

    public class MyPdfMediaType : MediaData
    {
        [JsonIgnore]
        public override Blob BinaryData { get; set; }
    }

But since the client wanted to have the file content searchable, I decided only to filter the property when the filesize reached the find service limit. 

ContentIndexer.Instance.Conventions.ForInstancesOf<IContentMedia>().IndexAttachment(x => !IsFileSizeLimitReached(x));

...and for this I used an extention method to check against filesize binary data:

        private static bool IsFileSizeLimitReached(IBinaryStorable binaryContent)
        {
            // Note: 37 MB max. size refers to the base64 encoded file size .
            const int limitKb = 37000;

            try
            {
                var blobByte = (binaryContent.BinaryData as AzureBlob)?.ReadAllBytes() ??
                               (binaryContent.BinaryData as FileBlob)?.ReadAllBytes();

                if (blobByte == null)
                    return false;

                double fileSize = blobByte.Length;

                var isLimitReached = (int)(fileSize / 1024) >= limitKb;

                return isLimitReached;
            }
            catch
            {
                return false;
            }
        }

Once in place I was able to run the job with no exceptions, no timeouts and a happy client!

May 01, 2021

Please login to comment.