Content Graph, the Story So Far From a Developer's Perspective
As content is created and managed, the volume of data grows. The need to fish for information in the ocean of data becomes crucial.
“A world where everyone creates content gets confusing pretty quickly without a good search engine.” --- Ethan Zuckerman
Here search engines -- and our new service Content Graph -- come into play! In this blog post, I will explain how we started our journey of creating this brand-new service in-house at Optimizely from a developer's point of view.
We already have a very powerful fluent C# client API that allows you to search in your content called Search & Navigation. This is tightly coupled to the CMS and site building. As we offer a DXP with a headless CMS with the Content Cloud, there is an opportunity to de-couple and offer a new service with capabilities similar to Search & Navigation, but one that is not coupled to a programming language or framework and is easy to use by developers and even non-developers. John Håkansson has already announced the public beta of this service called Content Graph. Jonas Bergqvist has created a tutorial on how to get started with React and TypeScript. In this blog post, I am reflecting on what we have done so far, and try to illlustrate the vision and hopes of this new service from an engineering perspective.
Opportunity to Optimize
We identified an opportunity to help our customers. But for us as engineers, we see that with a new service comes new technical opportunities to do things not only differently, but also in a better way. There are lessons to be learned from Search & Navigation in terms of operations but also common and not so common use-cases. We wanted to learn from our vast experience and improve as much as we can with this new service. And have a solid foundation to build on to increasingly deliver more value with quality in the future.
New architecture, newer technologies
As Content Graph is a new service, it has also been built up entirely from scratch at Optimizely. The GraphQL runtime is hosted on a CDN, which allows for the lowest and stable latencies possible across regions. This service interfaces with a set of interconnected containerized micro-services that are running on Kubernetes. We are also using OpenSearch, the open-source fork of Elasticsearch as a distributed search engine to store, index and retrieve data. This is also deployed as containers on Kubernetes. For all our micro-services and OpenSearch clusters, we have configured horizonal auto-scaling, and with the case of OpenSearch, we are using a Kubernetes operator. And this is all provisioned with automated pipelines with unit, end-to-end and performance tests to ensure continued quality.
This setup allows our geographical distributed teams to simplify the development, release and deployment processes --- something we are contantly improving. Also, to optimize IT costs by being smarter with resource allocation of different micro-services using horizontal auto-scaling. It makes our service more stable with high availability with self-healing in case of crash of an instance of a service. And it makes us independent of any cloud provider just in case.
As with any new service built from scratch, one that is offered as-a-service, we need to be prepared and ready to understand what is going on, make sure everything is up and running, and offer support in case of issues. In the beta phase of Content Graph, the development team also offers operational support.
We have spent considerable time on operational readiness, i.e., to improve our logging and traceability, so we can improve our monitoring and alerting capabilities in case of system issues but also to get a greater understanding of usage metrics. Dashboards with metrics have been created and alerts via email and chat have been set.
Good developer documentation is pivotal as well. We made writing developer documentation part of our development work. A user story will only be accepted as done when the documentation is there or has been updated. The developers' documentation of Content Graph is kept up to date as we extend our product.
This all gives us a feeling that we can be the captain on our ship, as more passengers board to join this exciting journey.
Performance and experimentation
Having the best possible performance is also one of our key objectives. With performance here we mean the efficiency of processing requests and returning responses with the lowest latency possible.
“There are no speed limits on the road to success.” --- David W. Johnson
It is a given that we will always have a hop between the CDN and our Kubernetes cluster via the internet. We reduce this by using caching in the CDN. The distribution of the queries sent to our system will likely have a long tail, but it is expected that the head of the distribution will consist of the bulk of the requests. This ensures the fastest possible performance. Content can also be frequently updated, so we will invalidate caches when there are updates. Caching will be made smarter in the future.
Besides caching in the CDN, we spent time on improving the performance within the cluster by tracing the response times in the different services. Crucial here is benchmarking with performance tests. Given the same experimental conditions, we can hypothesize and tweak things --- like different configurations or different implementations --- that we think will be an improvement and compare the results. And gains of every hundred milliseconds count! Besides software development, we also experiment in an Agile way --- incrementally and iteratively. And also here, documentation is key. In a future blog post, we will offer more insights into the setup and results of our experimentation.
Search as a Service, Reloaded
Content Graph can deliver content to you. It offers an intuitive query language that allows you to precisely filter, select and navigate the information you need based on strongly typed schemas. What you request is exactly what you get. Content delivery for building or populating sites is made simple and easy.
But the core of Content Graph is driven by a search engine. Search engines are information systems designed to help find stored information with a query, which are often delivered as a list of ordered results. The primary advantage of a search engine is flexible retrieval of results with high performance in both efficiency and effectiveness. So a key differentiator of Content Graph are its search capabilities. We offer precise and powerful text matching. You can increase visitor engagement on your site by offering site search, e.g., adding a search box and facets, as well as different ways to rank your results very effectively, but also accurately and efficiently. We have value-based ordering by fields and have state-of-the-art relevance ranking. The information will be optimally returned in the order that will drive improved conversions. Great way to distribute content with visitors, but also between systems like web crawlers. We will continue offering new search capabilities that will make a big difference for our customers, and improve and go beyond what we offer now. And if you have ideas or a wishlist, you are very welcome to share them with us.
I would argue that Content Graph is in essence primarily a search engine with graph capabilities --- but one that is offered as-a-service and comes out of the box without any complicated configuration needed, required steep learning curve and know-how, or operational costs. Building a search engine with your data has become very easy to provide access to your content. The time to market is reduced. And we allow you to focus on what’s most important to you: delivering optimal user experiences, which includes search experience, with content creation on your platform. Stay tuned for a future blog post by me about this topic.
So, buckle up, we are in for a fun ride! The journey continues.
Content Graph is available for all Optimizely DXP customers. Feel free to get started.
Help us gathering bugs! Users of the beta release are encouraged to report any bugs at our support team.
Feature or change requests are warmly welcomed as well. You can create ideas and feedback here.
The developers of Content Graph look forward to receiving your feedback on this beta release!