A critical vulnerability was discovered in React Server Components (Next.js). Our systems remain protected but we advise to update packages to newest version. Learn More

Ben Nitti
May 1, 2021
  4273
(1 votes)

Optimizing your Optimizely Search & Navigation service for large files

Awhile ago I had a client with an excess of large files. I had increased their upload size limit to 2 GB and many of their documents were between 50 MB; a dozen or so files were between 1 and 2 GB.  Episerver recommends not exceeding the  by default 50 MB maximum request size.

Not surprisingly the indexing job started timing out and required immediate attention. 

I found there were several ways to tweak the the performance by filtering these files from the indexing job. 

I created an initialization module and changed the default batch sizes for the Find service. ContentBatchSize is used for the find index job, MediaBatchSize is for the event-driven indexing on media types. 

    [InitializableModule]
    [ModuleDependency(typeof(IndexingModule))]
    public class FileIndexingConventions : IInitializableModule
    {
        public void Initialize(InitializationEngine context)
        {
            ContentIndexer.Instance.MediaBatchSize = 3;     // Default is 5
            ContentIndexer.Instance.ContentBatchSize = 50;  // Default is 100
        }

        public void Uninitialize(InitializationEngine context)
        {
            throw new NotImplementedException();
        }
    }

I had several ways to filter out these large files. I could filter out IContentMedia from the index entirely or do the same with a custom type for pdfs and zip extensions.

ContentIndexer.Instance.Conventions.ForInstancesOf<MyPdfMediaType>().ShouldIndex(x => false);

Alternatively, I could stop the binary data from being indexed by decorating the propery with the [JsonIgnore] attribute:

    public class MyPdfMediaType : MediaData
    {
        [JsonIgnore]
        public override Blob BinaryData { get; set; }
    }

But since the client wanted to have the file content searchable, I decided only to filter the property when the filesize reached the find service limit. 

ContentIndexer.Instance.Conventions.ForInstancesOf<IContentMedia>().IndexAttachment(x => !IsFileSizeLimitReached(x));

...and for this I used an extention method to check against filesize binary data:

        private static bool IsFileSizeLimitReached(IBinaryStorable binaryContent)
        {
            // Note: 37 MB max. size refers to the base64 encoded file size .
            const int limitKb = 37000;

            try
            {
                var blobByte = (binaryContent.BinaryData as AzureBlob)?.ReadAllBytes() ??
                               (binaryContent.BinaryData as FileBlob)?.ReadAllBytes();

                if (blobByte == null)
                    return false;

                double fileSize = blobByte.Length;

                var isLimitReached = (int)(fileSize / 1024) >= limitKb;

                return isLimitReached;
            }
            catch
            {
                return false;
            }
        }

Once in place I was able to run the job with no exceptions, no timeouts and a happy client!

May 01, 2021

Comments

Please login to comment.
Latest blogs
A day in the life of an Optimizely OMVP - OptiGraphExtensions v2.0: Enhanced Search Control with Language Support, Synonym Slots, and Stop Words

Supercharge your Optimizely Graph search experience with powerful new features for multilingual sites and fine-grained search tuning. As search...

Graham Carr | Dec 16, 2025

A day in the life of an Optimizely OMVP - Optimizely Opal: Specialized Agents, Workflows, and Tools Explained

The AI landscape in digital experience platforms has shifted dramatically. At Opticon 2025, Optimizely unveiled the next evolution of Optimizely Op...

Graham Carr | Dec 16, 2025

Optimizely CMS - Learning by Doing: EP09 - Create Hero, Breadcrumb's and Integrate SEO : Demo

  Episode 9  is Live!! The latest installment of my  Learning by Doing: Build Series  on  Optimizely Episode 9 CMS 12  is now available on YouTube!...

Ratish | Dec 15, 2025 |

Building simple Opal tools for product search and content creation

Optimizely Opal tools make it easy for AI agents to call your APIs – in this post we’ll build a small ASP.NET host that exposes two of them: one fo...

Pär Wissmark | Dec 13, 2025 |

CMS Audiences - check all usage

Sometimes you want to check if an Audience from your CMS (former Visitor Group) has been used by which page(and which version of that page) Then yo...

Tuan Anh Hoang | Dec 12, 2025

Data Imports in Optimizely: Part 2 - Query data efficiently

One of the more time consuming parts of an import is looking up data to update. Naively, it is possible to use the PageCriteriaQueryService to quer...

Matt FitzGerald-Chamberlain | Dec 11, 2025 |