Calling all developers! We invite you to provide your input on Feature Experimentation by completing this brief survey.

 

Ben Nitti
May 1, 2021
  2899
(1 votes)

Optimizing your Optimizely Search & Navigation service for large files

Awhile ago I had a client with an excess of large files. I had increased their upload size limit to 2 GB and many of their documents were between 50 MB; a dozen or so files were between 1 and 2 GB.  Episerver recommends not exceeding the  by default 50 MB maximum request size.

Not surprisingly the indexing job started timing out and required immediate attention. 

I found there were several ways to tweak the the performance by filtering these files from the indexing job. 

I created an initialization module and changed the default batch sizes for the Find service. ContentBatchSize is used for the find index job, MediaBatchSize is for the event-driven indexing on media types. 

    [InitializableModule]
    [ModuleDependency(typeof(IndexingModule))]
    public class FileIndexingConventions : IInitializableModule
    {
        public void Initialize(InitializationEngine context)
        {
            ContentIndexer.Instance.MediaBatchSize = 3;     // Default is 5
            ContentIndexer.Instance.ContentBatchSize = 50;  // Default is 100
        }

        public void Uninitialize(InitializationEngine context)
        {
            throw new NotImplementedException();
        }
    }

I had several ways to filter out these large files. I could filter out IContentMedia from the index entirely or do the same with a custom type for pdfs and zip extensions.

ContentIndexer.Instance.Conventions.ForInstancesOf<MyPdfMediaType>().ShouldIndex(x => false);

Alternatively, I could stop the binary data from being indexed by decorating the propery with the [JsonIgnore] attribute:

    public class MyPdfMediaType : MediaData
    {
        [JsonIgnore]
        public override Blob BinaryData { get; set; }
    }

But since the client wanted to have the file content searchable, I decided only to filter the property when the filesize reached the find service limit. 

ContentIndexer.Instance.Conventions.ForInstancesOf<IContentMedia>().IndexAttachment(x => !IsFileSizeLimitReached(x));

...and for this I used an extention method to check against filesize binary data:

        private static bool IsFileSizeLimitReached(IBinaryStorable binaryContent)
        {
            // Note: 37 MB max. size refers to the base64 encoded file size .
            const int limitKb = 37000;

            try
            {
                var blobByte = (binaryContent.BinaryData as AzureBlob)?.ReadAllBytes() ??
                               (binaryContent.BinaryData as FileBlob)?.ReadAllBytes();

                if (blobByte == null)
                    return false;

                double fileSize = blobByte.Length;

                var isLimitReached = (int)(fileSize / 1024) >= limitKb;

                return isLimitReached;
            }
            catch
            {
                return false;
            }
        }

Once in place I was able to run the job with no exceptions, no timeouts and a happy client!

May 01, 2021

Comments

Please login to comment.
Latest blogs
Level Up with Optimizely's Newly Relaunched Certifications!

We're thrilled to announce the relaunch of our Optimizely Certifications—designed to help partners, customers, and developers redefine what it mean...

Satata Satez | Jan 14, 2025

Introducing AI Assistance for DBLocalizationProvider

The LocalizationProvider for Optimizely has long been a powerful tool for enhancing the localization capabilities of Optimizely CMS. Designed to ma...

Luc Gosso (MVP) | Jan 14, 2025 | Syndicated blog

Order tabs with drag and drop - Blazor

I have started to play around a little with Blazor and the best way to learn is to reimplement some old stuff for CMS12. So I took a look at my old...

Per Nergård | Jan 14, 2025

Product Recommendations - Common Pitfalls

With the added freedom and flexibility that the release of the self-service widgets feature for Product Recommendations provides you as...

Dylan Walker | Jan 14, 2025

My blog is now running using Optimizely CMS!

It's official! You are currently reading this post on my shiny new Optimizely CMS website.  In the past weeks, I have been quite busy crunching eve...

David Drouin-Prince | Jan 12, 2025 | Syndicated blog

Developer meetup - Manchester, 23rd January

Yes, it's that time of year again where tradition dictates that people reflect on the year gone by and brace themselves for the year ahead, and wha...

Paul Gruffydd | Jan 9, 2025