Ben Nitti
May 1, 2021
  1820
(1 votes)

Optimizing your Optimizely Search & Navigation service for large files

Awhile ago I had a client with an excess of large files. I had increased their upload size limit to 2 GB and many of their documents were between 50 MB; a dozen or so files were between 1 and 2 GB.  Episerver recommends not exceeding the  by default 50 MB maximum request size.

Not surprisingly the indexing job started timing out and required immediate attention. 

I found there were several ways to tweak the the performance by filtering these files from the indexing job. 

I created an initialization module and changed the default batch sizes for the Find service. ContentBatchSize is used for the find index job, MediaBatchSize is for the event-driven indexing on media types. 

    [InitializableModule]
    [ModuleDependency(typeof(IndexingModule))]
    public class FileIndexingConventions : IInitializableModule
    {
        public void Initialize(InitializationEngine context)
        {
            ContentIndexer.Instance.MediaBatchSize = 3;     // Default is 5
            ContentIndexer.Instance.ContentBatchSize = 50;  // Default is 100
        }

        public void Uninitialize(InitializationEngine context)
        {
            throw new NotImplementedException();
        }
    }

I had several ways to filter out these large files. I could filter out IContentMedia from the index entirely or do the same with a custom type for pdfs and zip extensions.

ContentIndexer.Instance.Conventions.ForInstancesOf<MyPdfMediaType>().ShouldIndex(x => false);

Alternatively, I could stop the binary data from being indexed by decorating the propery with the [JsonIgnore] attribute:

    public class MyPdfMediaType : MediaData
    {
        [JsonIgnore]
        public override Blob BinaryData { get; set; }
    }

But since the client wanted to have the file content searchable, I decided only to filter the property when the filesize reached the find service limit. 

ContentIndexer.Instance.Conventions.ForInstancesOf<IContentMedia>().IndexAttachment(x => !IsFileSizeLimitReached(x));

...and for this I used an extention method to check against filesize binary data:

        private static bool IsFileSizeLimitReached(IBinaryStorable binaryContent)
        {
            // Note: 37 MB max. size refers to the base64 encoded file size .
            const int limitKb = 37000;

            try
            {
                var blobByte = (binaryContent.BinaryData as AzureBlob)?.ReadAllBytes() ??
                               (binaryContent.BinaryData as FileBlob)?.ReadAllBytes();

                if (blobByte == null)
                    return false;

                double fileSize = blobByte.Length;

                var isLimitReached = (int)(fileSize / 1024) >= limitKb;

                return isLimitReached;
            }
            catch
            {
                return false;
            }
        }

Once in place I was able to run the job with no exceptions, no timeouts and a happy client!

May 01, 2021

Comments

Please login to comment.
Latest blogs
Update on .NET 8 support

With .NET 8 now in release candidate stage I want to share an update about our current thinking about .NET 8 support in Optimizely CMS and Customiz...

Magnus Rahl | Oct 3, 2023

Adding Block Specific JavaScript and CSS to the body and head of the page

A common requirement for CMS implementations includes the ability to embed third party content into a website as if it formed part of the website...

Mark Stott | Oct 3, 2023

Performance optimization – the hardcore series – part 1

Hi again every body. New day – new thing to write about. today we will talk about memory allocation, and effect it has on your website performance....

Quan Mai | Oct 3, 2023 | Syndicated blog

IDX21323 - RequireNonce is true, Nonce was null

We have multiple clients configured with Azure Active Directory (Microsoft Entra) for requiring authentication when accessing their website. The...

David Drouin-Prince | Oct 1, 2023 | Syndicated blog

Minimum Detectable Effect in Optimizely Web Experimentation

Understanding Minimum Detectable Effect Minimum Detectable Effect (MDE) is a core statistical calculation in Optimizely Web Experimentation. In...

Matthew Dunn | Oct 1, 2023 | Syndicated blog

Configured Commerce - Introduction to Long-Term Support (LTS) Releases

First off, for those who have not had a chance to meet me yet, my name is John McCarroll, and I am the Technical Product Manager for the Optimizely...

John McCarroll | Sep 29, 2023