Henrik Fransas
May 16, 2015
  9523
(5 votes)

Adding EPiServer Find to Alloy - Part 2

In this blog series I am going to show you step by step how you can add EPiServer Find to a project and also how to use it and the most functions in it.

Part 2 - Filtering what to index

It is pretty easy to set up Find to only index what you want when you have done it a couple of time. It should be set up in application start so you can do it all in global.asax and the function Application_Start but I prefer to do it either in a static function that I call from Application_Start or even better in a InitializableModule that will automaticly be run on start by EPiServer. In this example we are going to do it with a InitializableModule.

Start by creating a new class in folder Initialization below Business that you call EPiServerFindInitialization. Let it implement the interface IInitializableModule and decorate it with [ModuleDependency(typeof(InitializationModule))]. Make shore it implements the necessary functions that the interface demands. When you are done, you will end up with a class similar to this:

using System;
using EPiServer.Framework;
using EPiServer.Framework.Initialization;
using InitializationModule = EPiServer.Web.InitializationModule;

namespace AlloyWithFind.Business.Initialization
{
    [ModuleDependency(typeof(InitializationModule))]
    public class EPiServerFindInitialization : IInitializableModule
    {
        public void Initialize(InitializationEngine context)
        {
            throw new NotImplementedException();
        }

        public void Uninitialize(InitializationEngine context)
        {
            throw new NotImplementedException();
        }
    }
}

Right now it does not do much so let’s start implementing things. We start by removing (if you have it) the throw new NotIm... from both functions.

In the function we start by adding this line of code.

// Exclude all content from being indexed
            ContentIndexer.Instance.Conventions.ForInstancesOf<IContent>().ShouldIndex(x => false);

If you read the comment you can see that what this does is that it will exclude all content from being indexed. This can seem strange but I prefer this that since that gives me total control over what content are being indexed and not. If you prefer to instead exclude only the types you do not want to index that is ok also, please write a comment to the post with your thought. After you have done this, rebuild you solution, run it and re-run the indexing job (EPiServer Find Content Indexing Job). When that is done, if you go into the Find-tab in admin and to the Overview part again you should now see that there are nothing in the index.

Start by creating a new function inside your class that you call:ShouldIndexPagedata that takes a PageData object as a in parameter.
Also for this to work create a dynamic content property in admin that you call MetaRobots. We use this property to set information for EPiServer Find, but also for all other search engines so they know if to index and follow the links on that page. We have it as a dynamic property since it makes it easy to disable indexing on a whole part of the tree, but it can also be a property on each page.

We use it like this in our layout page:

// MetaRobots
    var robots = Model.Page.GetPropertyValue<string>("MetaRobots", "INDEX, FOLLOW");
    if (!string.IsNullOrWhiteSpace(robots) && robots != "INDEX, FOLLOW")
    {
        @Html.Raw(HtmlHelpers.RenderMetaData("ROBOTS", robots))  
    }

Where RenderMetaData looks like this:

public static string RenderMetaData(string name, string value)
{
    return string.Format("<meta name=\"{0}\" content=\"{1}\" />", name, value);
}

With these few lines of code we help google and all other search engines to index only the content we want it to index.
Back to Find-implementation, your new function should look like this:

private bool ShouldIndexPagedata(PageData page)
{
    //Check if the page is published, not marked as noindex and does not implement IContainerPage
    var shouldIndex = page.CheckPublishedStatus(PagePublishedStatus.Published) 
        && page.GetPropertyValue<string>("MetaRobots") != "NOINDEX, NOFOLLOW"
        && !(page is IContainerPage);

    //remove this since it will remove stuff from the index and then add them again
    //If the page should not be indexed, try to delete it if it exists in the index
    //if (!shouldIndex)
    //{
    //   IEnumerable<EPiServer.Find.Api.DeleteResult> result;
    //    ContentIndexer.Instance.TryDelete(page, out result);
    //}

    return shouldIndex;
}

You could leave the delete part out of it but then you have to wait for the next complete reindex to get stuff removed from the index so I tend to keep it there even if it makes each save a little bit longer in time.  As you can see I actually do not care if it did a successful delete or not and that is because if an object does not exist in the index it will return false and that I do not care about. If it fails for another reason it will be cleaned up the next full reindexing.

So when you are done with that, it is time to get stuff indexed again. In your Initialize function add this line of code:

ContentIndexer.Instance.Conventions.ForInstancesOf<SitePageData>().ShouldIndex(ShouldIndexPagedata);

What this does is that for all ContentTypes that inherit from SitePageData it will run the function ShouldIndexPagedata. Just writing the name of the function there is an easier way of writing x=>ShouldIndexPagedata(x). Save your work, rebuild, run the site and run the scheduled task EPiServer Find Content Indexing Job again and voila now you see a lot of content again in the overview part of the Find interface.

So now we have all the pages that we want indexed in the index, but how about uploaded files? This is a little bit more tricky if you just do not want all uploaded media to be indexed (and if you have a big site, you probably do not want that). Henrik Lindström wrote an excellent blog post on how to index only linked objects in version 7 that I wrote an update on for version 7.5 and that is what we are going to do here. What this function basically does is for all media it checks in the softlink repository if there are any links to it and if there are any, it will check so they are published and should be indexed. It also check the suffix of the file against an array of accepted suffix that should be indexed. This function is not 100% but it works god.

When you are done, your class should look something like this:

using System;
using System.Collections.Generic;
using AlloyWithFind.Business.Rendering;
using AlloyWithFind.Models.Pages;
using EPiServer;
using EPiServer.Core;
using EPiServer.DataAbstraction;
using EPiServer.Find.Api;
using EPiServer.Find.Cms;
using EPiServer.Find.Cms.Conventions;
using EPiServer.Framework;
using EPiServer.Framework.Initialization;
using EPiServer.Logging;
using EPiServer.ServiceLocation;
using InitializationModule = EPiServer.Web.InitializationModule;

namespace AlloyWithFind.Business.Initialization
{
    [ModuleDependency(typeof(InitializationModule))]
    public class EPiServerFindInitialization : IInitializableModule
    {
        public void Initialize(InitializationEngine context)
        {
            var acceptedFileExtensions = new List<string>() { "doc", "docx", "ppt", "pptx", "pdf", "xls", "xlsx" };

            // Exclude all from indexing
            ContentIndexer.Instance.Conventions.ForInstancesOf<IContent>().ShouldIndex(x => false);

            ContentIndexer.Instance.Conventions.ForInstancesOf<SitePageData>().ShouldIndex(ShouldIndexPagedata);

            ContentIndexer.Instance.Conventions.ForInstancesOf<IContentMedia>().ShouldIndex(x =>
            {
                if (!acceptedFileExtensions.Contains(x.SearchFileExtension().ToLowerInvariant()))
                {
                    return false;
                }

                var contentRepository = ServiceLocator.Current.GetInstance<IContentRepository>();
                var contentSoftLinkRepository = ServiceLocator.Current.GetInstance<ContentSoftLinkRepository>();

                var softLinks = contentSoftLinkRepository.Load(x.ContentLink, true);

                try
                {
                    foreach (var softLink in softLinks)
                    {

                        if (softLink.SoftLinkType == ReferenceType.ExternalReference || softLink.SoftLinkType == ReferenceType.PageLinkReference)
                        {
                            var content = contentRepository.Get<IContent>(softLink.OwnerContentLink);
                            
                            if (content == null)
                                continue;

                            var shouldIndex = ContentIndexer.Instance.Conventions.ShouldIndexConvention.ShouldIndex(content);
                            if (shouldIndex != null && !shouldIndex.Value) // don't index referenced file if content is marked as not indexed
                            {
                                continue;
                            }

                            // only index if content is published
                            var publicationStatus = content.PublishedInLanguage()[(softLink.OwnerLanguage != null ? softLink.OwnerLanguage.Name : "sv")];

                            if (publicationStatus != null &&
                                (publicationStatus.StartPublish == null ||
                                 publicationStatus.StartPublish < DateTime.Now) &&
                                (publicationStatus.StopPublish == null ||
                                 DateTime.Now < publicationStatus.StopPublish))
                            {
                                return true;
                            }
                        }
                    }
                }
                catch (Exception exception)
                {
                    var logger = LogManager.GetLogger();
                    logger.Error("Error on indexing file",exception);
                }

                return false;
            });
        }

        public void Uninitialize(InitializationEngine context)
        {
            
        }

        private bool ShouldIndexPagedata(PageData page)
        {
            //Check if the page is published, not marked as noindex and does not implement IContainerPage
            var shouldIndex = page.CheckPublishedStatus(PagePublishedStatus.Published) 
                && page.GetPropertyValue<string>("MetaRobots") != "NOINDEX, NOFOLLOW"
                && !(page is IContainerPage);

            //If the page should not be indexed, try to delete it if it exists in the index
            //if (!shouldIndex)
            //{
            //    IEnumerable<DeleteResult> result;
            //    ContentIndexer.Instance.TryDelete(page, out result);
            //}

            return shouldIndex;
        }
    }
}

A little more code to get it working, but when you have read it through a couple of times, it is pretty straight forward.

In the next part we will start to create the search service class and interface that will do the communication with EPiServer Find

May 16, 2015

Comments

K Khan
K Khan May 17, 2015 07:36 PM

I will book mark this for training, Thanks!

Henrik Fransas
Henrik Fransas May 17, 2015 08:23 PM

I am glad that you like it!
I will later this year instruct the EPiServer Find course for developers at EPiServer. It is a great one day course.

May 19, 2015 08:01 AM

You might want to index unpublished content to be able to search for it in edit mode. FilterForVisitor could be used on the site to filter the unpublished versions.

Henrik Fransas
Henrik Fransas May 19, 2015 10:48 AM

That is true Guest.
You could do that, but I prefere to filter it on the way into the index, otherwise I need to filter out a lot more on searches since I do not want to search in all published content.

For search inside EPi I use EPiServer Search, even when published the page to Azure.

Fang Huang
Fang Huang Apr 5, 2017 10:49 AM

Thanks a lot for this great page! It makes my life much more easier with index-filtering by automatic at the start of Epi server :-)

Please login to comment.
Latest blogs
Increase timeout for long running SQL queries using SQL addon

Learn how to increase the timeout for long running SQL queries using the SQL addon.

Tomas Hensrud Gulla | Dec 20, 2024 | Syndicated blog

Overriding the help text for the Name property in Optimizely CMS

I recently received a question about how to override the Help text for the built-in Name property in Optimizely CMS, so I decided to document my...

Tomas Hensrud Gulla | Dec 20, 2024 | Syndicated blog

Resize Images on the Fly with Optimizely DXP's New CDN Feature

With the latest release, you can now resize images on demand using the Content Delivery Network (CDN). This means no more storing multiple versions...

Satata Satez | Dec 19, 2024

Simplify Optimizely CMS Configuration with JSON Schema

Optimizely CMS is a powerful and versatile platform for content management, offering extensive configuration options that allow developers to...

Hieu Nguyen | Dec 19, 2024