Vulnerability in EPiServer.Forms

Try our conversational search powered by Generative AI!

Ben Nitti
Oct 6, 2021
  2448
(1 votes)

How to exclude pages from your search index, sitemaps and internet search engines

Your client wants to exclude certain pages from showiing up in your site's search results, they also want prevent these pages from being crawled by external search engines (Google, Bing etc.) and lastly they want them removed from their sitemap.xml file.  All of this can be achieved and controlled with a checkbox property on the Settings tab in the editor. I'll demonstrate below. 

Create an interface with a boolean property that inherits IContent

	public interface IDisableIndexing : IContent
	{
		bool DisableIndex { get; set; }
	}

Add it to your base class or any page type 

    public abstract class SitePageData : PageData, IDisableIndex
    {
        [CultureSpecific]
		[Display(Name = "Disable Indexing",
			Description = "Removes the page from search index, sitemap and search engines",
			GroupName = SystemTabNames.Settings,
			Order = 10)]
		public virtual bool DisableIndex { get; set; }
    }

Optimizely Search & Navigation

You can filter  from your Optimizely Search & Navigation search index by creating a module dependency class for your search conventions (this where IContent is implemented)

	[ModuleDependency(typeof(IndexingModule))]
	public class FindConventionsInitialization : IInitializableModule
	{
		public void Initialize(InitializationEngine context)
		{
			var client = SearchClient.Instance;
			ContentIndexer.Instance.Conventions.ForInstancesOf<IHasDisableIndex>().ShouldIndex(x => !x.DisableIndex);
		}

		public void Uninitialize(InitializationEngine context) { }
	}

External Search Engines

Use the same boolean property to add instructions for search robots from the <head></head> element in your layout view

<head>
    @if (Model.DisableIndex)
    {
		<meta name="ROBOTS" content="noindex, nofollow" />
    }
</head>

Sitemaps

If you're using the Geta Sitemap generator you can extend it and filter these pages from being added to when the xml file is being generated. Create a class that inherits from the abstract base SitemapXmlGenerator class and interface ICommerceAndStandardSitemapXmlGenerator. Override the AddFilteredContentElement method and from there you can exlude the pages with IDisableIndex.  

    public class CommerceAndStandardSitemapXmlGenerator : SitemapXmlGenerator, ICommerceAndStandardSitemapXmlGenerator
	{
		public CommerceAndStandardSitemapXmlGenerator(
			ISitemapRepository sitemapRepository, 
			IContentRepository contentRepository, 
			UrlResolver urlResolver, 
			ISiteDefinitionRepository siteDefinitionRepository, 
			ILanguageBranchRepository languageBranchRepository, 
			IContentFilter contentFilter) 
			: base(sitemapRepository,  contentRepository, urlResolver, siteDefinitionRepository, languageBranchRepository, contentFilter)
		{
		}

		//Filter content from xml sitemap
		protected override void AddFilteredContentElement(CurrentLanguageContent languageContentInfo, IList<XElement> xmlElements)
		{
			var sitemapContent = languageContentInfo.Content as IHasDisableIndex;

			if (sitemapContent != null && sitemapContent.DisableIndex)
			{
				return;
			}

			base.AddFilteredContentElement(languageContentInfo, xmlElements);
		}
	}
Oct 06, 2021

Comments

Please login to comment.
Latest blogs
Stop Managing Humans in Your CMS

Too many times, a content management system becomes a people management system. Meaning, an organization uses the CMS to manage all the information...

Deane Barker | Nov 30, 2023

A day in the life of an Optimizely Developer - Optimizely CMS 12: The advantages and considerations when exploring an upgrade

GRAHAM CARR - LEAD .NET DEVELOPER, 28 Nov 2023 In 2022, Optimizely released CMS 12 as part of its ongoing evolution of the platform to help provide...

Graham Carr | Nov 28, 2023

A day in the life of an Optimizely Developer - OptiUKNorth Meetup January 2024

It's time for another UK North Optimizely meet up! After the success of the last one, Ibrar Hussain (26) and Paul Gruffydd (Kin + Carta) will be...

Graham Carr | Nov 28, 2023

Publish content to Optimizely CMS using a custom GPT from OpenAI 🤖

Do you find the traditional editor interface complicated and cluttered? Would you like an editorial AI assistant you can chat with? You can!

Tomas Hensrud Gulla | Nov 28, 2023 | Syndicated blog

Optimizely Graph and Next.js: Building Scalable Headless Solutions

Optimizely Graph harnesses the capabilities of GraphQL, an intuitive and efficient query language to, transform content within an Optimizely CMS in...

Szymon Uryga | Nov 27, 2023

Getting Started with Optimizely SaaS Core and Next.js Integration: Testing Content Updates

The blog post discusses the challenges of content updates on a website using Optimizely CMS, Next.js, and the Apollo Client due to Apollo's local...

Francisco Quintanilla | Nov 27, 2023 | Syndicated blog