Episerver Find - enhanced indexing

Vote:
 

Hi All,

Our site recently went through a consolidation of content, the normal result of returned pages is now not enough to find content on the page.

I can manipulate the index using the HTML agility pack to index a page multiple times based on H2 tags and the content within the the P directly under it for the precis/excert

Question is, how much find automation will I lose, say when the page is published, deleted, expired etc and should I try to override the Find indexing task ?

Or do I just write my own task and disable the current find indexing task and then handle all of the page events ?

Appreciate your advice in advance.

Regards,

Paul

#258896
Jul 14, 2021 1:35
Vote:
 

What are you trying to index, the HTML of a page? If so I'd just create some extension methods off your model type and index it through that or custom index models yourself https://world.optimizely.com/documentation/developer-guides/search-navigation/NET-Client-API/Customizing-serialization/

#258910
Jul 14, 2021 8:21
Vote:
 

Hi all, 

In the idea this may help someone in the future I've included the solution below.

The first part indexes external content, the second part indexes all H2 and text as separate items.

It's greatly improved the search results for us:

   [ScheduledPlugIn(
        DisplayName = "* Index page fragments",
        Description = "",
        SortIndex = 0,
        DefaultEnabled = true,
        InitialTime = "1.1:0:0",
        IntervalLength = 24,
        IntervalType = ScheduledIntervalType.Hours)]
    public class FindIndexCustomSite : ScheduledJobBase
    {
        private bool _stopSignaled;
        private static IClient _client;
        private static readonly ILogger _logger = LogManager.GetLogger();

        public FindIndexCustomSite(IClient client)
        {
            _client = client;
            IsStoppable = true;
        }

        /// 
        /// Called when a user clicks on Stop for a manually started job, or when ASP.NET shuts down.
        /// 
        public override void Stop()
        {
            _stopSignaled = true;
            base.Stop();
        }

        /// 
        /// Called when a scheduled job executes
        /// 
        /// A status message to be stored in the database log and visible from admin mode
        public override string Execute()
        {
            //Call OnStatusChanged to periodically notify progress of job for manually started jobs
            OnStatusChanged("Starting execution of indexing page fragment");

            try
            {
                // Get the data from the Customs website...
                var apiKey = SiteSettingService.Instance.SettingsPage.CustomsAPIKey;
                var client = new RestClient(
                    ConfigurationManager.AppSettings["CustomsAPIProjectsUrl"] +
                    "?project_type=Public&project_status=Active|Open");
                client.Timeout = -1;
                var request = new RestRequest(Method.GET);
                request.AddHeader("authtoken", apiKey);
                IRestResponse response = client.Execute(request);
                Console.WriteLine(response.Content);

                var Customs = CustomsProject.FromJson(response.Content);


                var listOfProjects = new List<SearchFragments>();

                _client.Delete<SearchFragments>(y => y.SearchFragmentType.MatchCaseInsensitive("custom"));


                if (Customs != null && Customs.Result != null && Customs.Result.Count > 0)
                {
                    foreach (var proj in Customs.Result)
                    {
                        listOfProjects.Add(
                            new SearchFragments()
                            {
                                Description = proj.Description,
                                Name = proj.Name,
                                DateCreated = proj.Created,
                                DateModified = proj.LastUpdated,
                                ImageUrl = proj.ProjectImage?.ToString(),
                                ThumbnailUrl = proj.ProjectThumbnail?.ToString(),
                                ProjectCategories = proj.Attributes.ProjectCategory,
                                Suburbs = proj.Attributes.ProjectLocation,
                                Url = proj.Url?.ToString(),
                                SearchHitUrl = proj.Url?.ToString(),
                                SearchSection = "Customs",
                                SearchText = proj.Description,
                                SearchSummary = proj.Description,
                                SearchTitle = "Customs: " + proj.Name,
                                SearchFragmentType = "custom"
                            });
                    }

                    foreach (var item in listOfProjects)
                    {
                        _client.Index(item);
                    }
                }

                // Index page fragments
                var pgs = ContentService.Instance.GetDescendants<SitePageData>(ContentReference.StartPage, 10);
                var counter = 0;
                var listOfPagesFragments = new List<SearchFragments>();
                foreach (var pg in pgs)
                {
                    if (pg.MainBody != null)
                    {
                        var html = pg.MainBody;

                        if (!string.IsNullOrEmpty(html.ToString()))
                        {
                            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
                            doc.LoadHtml(html.ToString());
                            
                            var headers = doc.DocumentNode.SelectNodes("//h2");

                            if (headers != null)
                            {
                                foreach (HtmlNode item in headers)
                                {
                                    var mainHeading = HttpUtility.HtmlDecode(item.InnerText.Trim());
                                    var text = "";
                                    // editors can apply this class to H2 headings to remove from index and "on this page" so we ignore it.
                                    if (item.Attributes.Any(x => x.Value == "hideFromAuto"))
                                    {
                                        continue;
                                    }
                                    var next = item.NextSibling;
                                    var lastItem = false;
                                    while (next != null)
                                    {
                                        if (next.FirstChild != null)
                                        {
                                            text = text + " " + HttpUtility.HtmlDecode(next.InnerText.Trim()); //text
                                        }

                                        if (next.Name == "h2" || String.IsNullOrEmpty(next.Name))
                                        {
                                            listOfPagesFragments.Add(
                                                new SearchFragments()
                                                {
                                                    Description = pg.PageTitle,
                                                    Name = pg.PageName,
                                                    DateCreated = pg.Created,
                                                    DateModified = pg.Changed,
                                                    Url = UrlResolver.Current.GetUrl(pg.PageLink),
                                                    SearchHitUrl = UrlResolver.Current.GetUrl(pg.PageLink),
                                                    SearchSection = "pagefragment",
                                                    SearchText = text,
                                                    SearchSummary = text,
                                                    SearchTitle = mainHeading,
                                                    SearchFragmentType = "custom"
                                                });
                                            lastItem = true;
                                            break;
                                        }

                                        next = next.NextSibling;
                                    }

                                    if (!string.IsNullOrEmpty(text) && !string.IsNullOrEmpty(mainHeading) && !lastItem)
                                    {
                                        listOfPagesFragments.Add(
                                            new SearchFragments()
                                            {
                                                Description = pg.PageTitle,
                                                Name = pg.PageName,
                                                DateCreated = pg.Created,
                                                DateModified = pg.Changed,
                                                Url = UrlResolver.Current.GetUrl(pg.PageLink),
                                                SearchHitUrl = UrlResolver.Current.GetUrl(pg.PageLink),
                                                SearchSection = "pagefragment",
                                                SearchText = text,
                                                SearchSummary = text,
                                                SearchTitle = mainHeading,
                                                SearchFragmentType = "custom"
                                            });
                                        text = "";
                                        mainHeading = "";
                                    }
                                }
                            }
                        }
                    }
                }

                foreach (var item in listOfPagesFragments)
                {
                    _client.Index(item);
                }
            }
            catch (Exception ex)
            {
                _logger.Error("Index Search Fragement Item Error: " + ex.ToString());

                return "Error. Check logs for details.";
            }

            //For long running jobs periodically check if stop is signaled and if so stop execution
            if (_stopSignaled)
            {
                Stop();
                return "Stop of job was called";
            }

            return "Success!";
        }
    }
#259672
Jul 29, 2021 23:43
Vote:
 

What part of the page content are you missing when indexing? If it is content in content areas, then you can decorate the content types you want to be included in indexing a page with the IndexInContentAreas attribute and EPiServer will take care of it for you.  You could even look at overriding the IContentIndexerConventions.ShouldIndexInContentAreaConvention to handle behaviour specific to your build.  For more information you can look here: https://world.optimizely.com/documentation/developer-guides/search-navigation/Integration/cms-integration/Indexing-content-in-a-content-area/

#259976
Aug 03, 2021 8:40
Vote:
 

It was more the content being merged from multiple pages into one page.

A generic example might be the old site had pages for say Policies:

  • Policy 1 with a description
  • Policy 2 with a description
  • Policy n with a description

This was consolidated into one page:

  • Policies
    • (h2) Policy 1 with a description
    • (h2) Policy 2 with a description
    • (h2) Policy n with a description

So this made the search results, difficult to determine if you were being directed to the correct page, 

The search would return "policies" when a specific policy was entered or something that was on the page a little different.

This method treats each h2 and content as an individual search result (to the same page) we also added an auto scroll to the content by passing the h2 as a parameter when clicking the search result link.

It's been received really well.

Here is the scrolling script and submit to google analytics

    @if (!string.IsNullOrEmpty(Request["st"]))
     {
         var isAuto =  Request["a"];
         var scrollTo = Request["st"];
         var originalSearch = Request["q"];
         <text>
             <script>
                 var element = $("#main-content").find("h2:contains('@scrollTo')");
                 if (element.length === 0) {
                     element = $("#main-content").find("h1:contains('@scrollTo')");
                 }
                 if (element.length > 0) {
                      element.scrollTop(0);
                     $('html, body').scrollTop(0);
                     $('html, body').animate({
                             scrollTop: element.offset().top

                         },
                         {
                             complete: function() {

                                 $(window).scrollTop(element.offset().top);
                             }
                         },
                         1000);
                     $(function() {
                         ga('send', 'event', 'Search click', '@scrollTo@isAuto', '@originalSearch');
                     });
                 }

             </script>
        </text>
    }
#260024
Aug 03, 2021 23:23
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.