Why Content Graph is Search as a Service
John Håkansson has already announced the public beta of a new Optimizely service called Content Graph. Jonas Bergqvist has created a tutorial on how to get started with React and TypeScript. I blogged previously about our developer journey so far in creating this service. In this blog post I want to follow-up and explain why Content Graph is not only for content delivery, but that it can be the search engine for your site.
As recently as the 1990s, studies showed that most people preferred getting information from other people rather than from search engines. Back then, most people also used human travel agents to book their travel or asked a librarian to find a book. However, times have changed. During the last decades, optimization of information retrieval effectiveness has driven web search engines to new quality levels where most people are satisfied most of the time, and web search has become the preferred source of information finding.
Any website that continuously publishes new content, needs to have a search engine. Content Graph does exactly that. It is an Optimizely SaaS solution for creating a website with search functionality using a GraphQL API that is hosted on the CDN. A very nice content delivery API using GraphQL that is platform independent. However, I would argue that Content Graph primarily allows you to build an advanced search engine, and not merely building a website, as it can do much more.
Why Building a Search Engine is Hard?
“Consider a future device … in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.” – Vannevar Bush, 1945
The vision of having a single system which you can access as an archive with speed and flexibility has been realized with the advent of search engines. A search engine is a complex system. Building and maintaining one is very hard and costly. From optimally storing the data in a way that you can perform sophisticated algorithms on for matching and ranking, to retrieving it as fast as possible, to processing queries that capture simple or very complicated information needs.
We have off-the-shelf search engines that allow you to build your own search engine with your data. Well-known ones are Apache Solr, Elasticsearch and OpenSearch which are driven by the information retrieval library Lucene. Other lesser-known examples, but not less powerful, are for example Sphinx or Vespa. What they all have in common is that you need to spend considerable time and effort in creating an application with them like infrastructure configuration and hosting, defining/tweaking index schemas, preparing data for ingestion, and writing optimal queries. In short, a lot of preparation and testing is needed before you can create and deploy your search engine.
Elasticsearch is offered as-a-service in a single-tenant environment by Elastic, and OpenSearch as well by AWS and others. Basically, this means that it is offered as platform-as-a-service. You no longer need to manage the hosting and operations yourself, but you are still responsible for the higher software layer and use this platform it in the best possible way in your application. Getting it implemented is one thing, getting it right so you get good performance and conversion rates is difficult and another cup of tea. Something that requires continuous experimentation and improvements.
How can we offer a search engine from platform-as-a-service to truly software-as-a-service, where we do the heavy lifting for you? And how can we make sure you will get it right for your business context?
A Search Engine as a Service?
Imagine that we can move up in the value chain, where we do all the ingestion of your content, content enrichment, configurations, tweaking and tuning of the most efficient and effective queries, and most useful ranking models for you. All that you must do to get started is adding a few lines of configuration and hitting the button. That would be truly software as a service, and in the case of Content Graph, search (engine) as a service.
This is exactly what we have created.
Content Graph is a multi-tenant cloud service that offers you search as a service with GraphQL. We believe that GraphQL allows you to create intuitive and simple queries without a steep learning curve, because it is grounded in a strongly typed schema that offers introspection support. The query language that we developed with GraphQL allows you to query for your content created in Content Cloud, so you could create a whole site with a single GraphQL query. And it allows you to create site search by using the same API with predefined query templates to filter, match, retrieve and rank the results given keywords entered by site visitors. All you need is one endpoint and authentication keys provisioned by the DXP portal. Sounds easy right?
Besides search features, we have worked hard on our infrastructure code and configurations. You do not need to worry about doing operations. We will manage that for you. That does not mean we will be constantly busy manually managing the clusters. The beating (search engine) heart of Content Graph is now OpenSearch. We have spent quite some time in automation of our multi-tenant distributed platform, so we can do rolling upgrades of newer (major) versions of OpenSearch containing bug fixes, performance improvements and new features, and when necessary, upgrade all indices after the upgrade without interrupting or degrading the service. We also have auto-scaling of our distributed search engine clusters, so we will be ready for Black Friday type of scenarios. At the same time, we realize that in a multi-tenant platform, performance isolation could be needed. Have a very noisy neighbor on the same cluster? Or is there a need to have single-tenant support? We are ready.
What Content Graph can do as a Search Engine
Ranking & Matching
Content Graph offers a query language that allows you to precisely filter, select and navigate the information you need. What you request is exactly what you get, preferably at the top. We have value-based ordering by fields and have state-of-the-art BM25 relevance ranking. The information will be optimally returned at blazing speed. You can rank your results very differently, but also accurately and efficiently. The results are ranked based on filtering in values in fields with wildcard support (where we optimized suffix searches) or full-text search with language analysis on fields that you want to support. We support text analysis for all languages that Search & Navigation supports, and have improved the full-text search in German and the CJK languages. This allows you to increase visitor engagement on your site by adding the all-important search box.
Query with relations
One of the things that we have created is a way to easily query with relations in your content. Not only within a single index with different content types, but also among indices with potentially different data sources. There is a reason why this service is called Content Graph after all. One of the first relations you can query on are parent-child relations. This means you can use one query to get information of both a parent document and its linked child documents. The powerful thing here is that you can do filtering and full-text search and add facets as you query with this relation type -- all with a single query. Content can be queried as a graph.
Besides search, we offer different ways to do navigation. Obviously you can navigate through your search space with pagination and cursors, but more interesting, you can use facets. So, you can present very different views of your content by just tweaking the ranking and/or adding facets and filters. With facets, we also offer the support of multi-select which is conventional in the e-commerce domain. This is a great navigation technique that allows you to simultaneously zoom in and zoom out on your data. Great way to distribute diverse (personalized) content with visitors across different channels, but also between systems such as other search engines using web crawlers.
The challenge for us is to create a solution that is generic but also flexible enough to realize different use-cases that could be possible.
“As such, there is no one-size-fits-all approach that anyone can offer you. The hot water that softens a carrot will harden an egg.” --- Clayton M. Christensen
We believe that the current set of search features allows you to build a very good search engine. We will offer more out-of-the-box search features that will be state-of-art and will make the search engine “more intelligent”, and will you to allow to opt-in to use AI search, so you will have a head start. Advances in conversational AI, such as with ChatGPT, shows that the domain of search is continuously improving and changing. I have written a short blog post before on this topic.
But we also realize that we need to give you the control required to configure the best search for your context and business cases, for example by configuring synonyms and field boosting. We support this now. We intend to create more search features that will give you some control over the nuts and bolts, so you can customize. I will explain more in a future blog post. We will continue offering new search capabilities that will make a big difference for our customers, and improve and go beyond what we offer now. Have ideas or a wishlist? You are very welcome to share them with us.
Want to see some GraphQL queries before getting started? The developer documentation of Content Graph with the description of its query language with plenty of example queries can be found on our ReadMe.
Content Graph is available for all Optimizely DXP customers. Feel free to get started.
Help us gathering bugs! Users of the beta release are encouraged to report any bugs at our support team.
Feature or change requests are warmly welcomed as well. You can create ideas and feedback here.