SaaS CMS has officially launched! Learn more now.

Missing most data on UnifiedSearchHit of VersioningFile

Vote:
 

I am doing a UnifiedSearchFor and I can see that I get results of type VersioningFile.

For these UnifiedSearchHit objects Url and Title properties are empty.

Search view in Find admin shows the file hits correctly.

I had a similiar problem for PageData hits before implementing ISearchContent on my base type.

I have tried projecting the properties in a manor like this without any luck:

SearchClient.Instance.Conventions.UnifiedSearchRegistry
                .ForInstanceOf<UnifiedFile>()
                .CustomizeProjection(x => x.ProjectUrlFrom<UnifiedFile>(file => file.VirtualPath));

In my init module I have:

SearchClient.Instance.Conventions.UnifiedSearchRegistry.Add(typeof(UnifiedFile));
#63725
Nov 27, 2012 16:27
Vote:
 

I'm using 1.0.0.278 and EPi 6.1.379.502.

#63736
Nov 28, 2012 9:33
Vote:
 

What happens if you don't add UnifiedFile in your init module?

Files and pages should be automatically added to the UnifiedSearchRegistry with sensible defaults for projections.

#63741
Nov 28, 2012 12:02
Vote:
 

I still get 33 items of "EPiServer.Web.Hosting.VersioningFile" in my index. But no difference on UnifiedSearchHit objects.

Correction: UnifiedSearchHit objects for files have disappeared.

Like I wrote I had a similiar problem for PageData hits before implementing ISearchContent on my base type so I might be missing something that happends on app init inside the product assemblies?

#63747
Edited, Nov 28, 2012 13:42
Vote:
 

Hmm, my guess is that while UnifiedSearch exists in the .NET 3.5/CMS 6 R2 version of the general .NET API (EPiServer.Find.dll) the automatic registering of pages and files which the .NET 4/CMS 7 version of the CMS integration (EPiServer.Find.Cms.dll) doesn't exist in the 6 R2-version.

With CMS 7 it should "just work" :)

Try something like this:

 

.CustomizeProjection(x => x.ProjectUrlFrom<UnifiedFile>(file => GetFileUrl(file.PermanentLinkVirtualPath)))

private static string GetFileUrl(string permanentLinkVirtualPath)
{
  UrlBuilder url = new UrlBuilder(permanentLinkVirtualPath);
  Global.UrlRewriteProvider.ConvertToExternal(url, null, Encoding.UTF8);
  return url.ToString();
}

    

#63749
Nov 28, 2012 14:13
Vote:
 

OK! Having this in my init module now and I can see that the Url prop of UnifiedFile got the correct path:

SearchClient.Instance.Conventions.UnifiedSearchRegistry.Add(typeof(UnifiedFile));
FileIndexer.Instance.Conventions.ShouldIndexVPPConvention = new VisibleInFilemanagerVPPIndexingConvention();
PageIndexer.Instance.Conventions.EnablePageFilesIndexing();

SearchClient.Instance.Conventions.UnifiedSearchRegistry
    .ForInstanceOf<UnifiedFile>()
    .CustomizeProjection(x => x.ProjectUrlFrom<UnifiedFile>(file => GetFileUrl(file.PermanentLinkVirtualPath)));

    

#63750
Edited, Nov 28, 2012 14:24
Vote:
 

Joel: In one of our project, same product versions as Johan Kronberg mentions but no PTB, we're not getting any titles or url's. I have registered projections for those properties but we only get a short excerpt. 

SearchClient.Instance.Conventions.UnifiedSearchRegistry.Add(typeof(PageData));

SearchClient.Instance.Conventions.UnifiedSearchRegistry
    .ForInstanceOf<PageData>()
    .CustomizeProjection(x =>
        x.ProjectTitleFrom<PageData>(page =>
            GetPageTitle(page)));

SearchClient.Instance.Conventions.UnifiedSearchRegistry
    .ForInstanceOf<PageData>()
    .CustomizeProjection(x =>
        x.ProjectUrlFrom<PageData>(page =>
            GetPageUrl(page)));

SearchClient.Instance.Conventions.UnifiedSearchRegistry
    .ForInstanceOf<PageData>()
    .CustomizeProjection(x =>
        x.ProjectTypeNameFrom<PageData>(page =>
            GetPageTypeName(page)));

Is it possible to filter results from UnifiedSearch? I'm only able to filter on the UnifiedSearchHit object's properties not the underlaying PageData object's properties.

Maybe there is a way to add a filter in the conventions instead? What I would like to do is to exclude some pages (e.g. pages with 'ExcludePageInSearch' set to true).

#63751
Nov 28, 2012 15:01
Vote:
 

The page variable will be null in those expressions :(

The expression must "point" to a (or several) properties which can be retrieved as fields from the index. For instance page => page.PageName. Ie, you could for instance add an extension method for PageData that retrieves a Headline property and configure that to be indexed (included). Then you could do:

page => GetFirstNonEmpty(page.Headline(), page.PageName)

 

Regarding filtering you can tell the UnifiedSearchRegistry to hold two types of filters for you, one that is always applied for the type and one that is applied when doing public search. I *think* the syntax is ForInstancesOf<PageData>().AlwaysFilter(...).

Other than that you can do:

UnifiedSearchFor("..")
.Filter(x => !x.MatchTypeHierarchy(typeof(PageData)) | ((PageData)x).PageName.Match("Something)) 

#63752
Nov 28, 2012 15:25
Vote:
 

In regards to what you're saying about "The expression must "point""... How would I go about projecting file.Summary.Title as the title with file.Name as fallback? Tried a couple of variations without any luck.

#63753
Nov 28, 2012 15:47
Vote:
 

If page is null, does that mean we can't use the indexer page["PageHeading"] in the expression? And we must use typed properties?

#63755
Nov 28, 2012 15:55
Vote:
 

Johan K, create a method that returns the first non-empty string amongst several (FirstNonEmpty(params string)), then file => FirstNonEmpty(file.Summary.Title, file.Name)

Johan P, correct. Find indexes code properties by default. You can instruct it to index other expressions, such as extension methods, but I'm 60% sure you can't tell it to index an indexer expression. Given your context I would personally create a number of extension methods for PageData that maps to the property names in ISearchContent (SearchTitle, SearchHitUrl etc) plus methods for any properties you need to filter on and then tell Find's client conventions to include them when indexing. Then you won't need to customize projections as nice matching values are already in the index.

#63756
Nov 28, 2012 16:05
Vote:
 

I took the Extension Method approach as well. Code for reference:

public static string SearchTitle(this UnifiedFile file)
{
    return file.Summary != null && !string.IsNullOrWhiteSpace(file.Summary.Title) ? file.Summary.Title : file.Name;
}

public static string SearchHitUrl(this UnifiedFile file)
{
    var url = new UrlBuilder(file.PermanentLinkVirtualPath);
    Global.UrlRewriteProvider.ConvertToExternal(url, null, Encoding.UTF8);
    return url.ToString();
}

    

 

// Index files
SearchClient.Instance.Conventions.UnifiedSearchRegistry.Add(typeof(UnifiedFile));
FileIndexer.Instance.Conventions.ShouldIndexVPPConvention = new VisibleInFilemanagerVPPIndexingConvention();
PageIndexer.Instance.Conventions.EnablePageFilesIndexing();

SearchClient.Instance.Conventions.ForInstancesOf<UnifiedFile>()
    .IncludeField(file => file.SearchHitUrl())
    .IncludeField(file => file.SearchTitle());
#63758
Edited, Nov 28, 2012 16:32
Vote:
 

Another related issue is that I have a PDF containing a word. When I search for this word in the Admin explorer view I get 1 file as result. When doing UnifiedSearchFor using the same phrase in my search page I get no results. Searching in my search page for a phrase in the file name I get the same file as result. Seems like the file content is not searched (guessing because of the missing hookups described above), how can I do this using the file content extraction of Find?

#63775
Nov 29, 2012 11:01
Vote:
 

In the 6 R2 CMS integration the actual file content is indexed as the return value of an extension method named Attachment (located in EPiServer.Find.Cms.UnifiedFileExtensions) while in the 7 CMS integration this method has been renamed SearchAttachment in order to match what Unified Search needs.

My suggestion would be to:

1. Create an extension for UnifiedFile named SearchAttachment.

public static class MyUnifiedFileExtensions
{
  public static Attachment(this UnifiedFile file)
  {
    return file.Attachment();
  }
}

    

2. Exclude the original Attachment method so you won't have to index potentially large files twice.

3. Include your SearchAttachment method.

 

 

 

#63802
Nov 29, 2012 18:27
Vote:
 

Works! Thanks!

Another thing... Which prop name/extension method handles the matching between File Best Bet and UnifiedFile? I think that's why my File Best Bets don't get boosted.

#63815
Nov 30, 2012 9:43
Vote:
 

They are matched by the index ID (_id) which is retrieved using an extension method for UnifiedFile named GetIndexId. That method returns an id by creating a hash from either the PermanentLinkVirtualPath property or the VirtualPath property if the first is null. 

Are the files returned in the search result but not placed first?

#63852
Dec 03, 2012 0:06
Vote:
 

Are the files returned in the search result but not placed first?

When in result they are not placed first. If added for a word that won't have the file in the normal result they are still not in the result at all.

#63855
Dec 03, 2012 9:07
Vote:
 

Hmm. Well, they shouldn't be returned if they don't match the search query (although we could certainly work around that to).

But the fact that they aren't boosted is strange. What you could do is inspect the request (or post it here for me and others to see). Here's how to do that if you're using IIS.

1. Install and run Fiddler.

2. Configure your application pool in IIS to run as your user. Note that this is crucial. The app pool must run as the exact same user as the one running Fiddler.

3. Trigger a search that should result in a best bet being applied.

4. Locate the request in the Fiddler log.

 

#63879
Dec 04, 2012 9:15
Vote:
 

When I analyze the Request I can't see that anything concerning best bets is in the JSON sent to Find...

#63916
Edited, Dec 04, 2012 17:47
Vote:
 

Could you post the request body here (with anything customer specific removed)? The best bet might not be that obvious :)

Also, could you try passing a language to the Search method if you don't already do that?

#63918
Dec 04, 2012 20:10
Vote:
 

OK! Only a list of PTB page types are removed:

{
   "from":0,
   "size":10,
   "query":{
      "filtered":{
         "query":{
            "query_string":{
               "fields":[
                  "SearchTitle$$string.sv",
                  "SearchText$$string.sv",
                  "SearchSummary$$string.sv",
                  "SearchAttachment$$attachment"
               ],
               "query":"dator"
            }
         },
         "filter":{
            "or":[
               {
                  "term":{
                     "___types":"EPiServer.Find.UnifiedSearch.ISearchContent"
                  }
               },
               {
                  "term":{
                     "___types":"EPiServer.Web.Hosting.UnifiedFile"
                  }
               }
            ]
         }
      }
   },
   "facets":{
      "SearchTypeName":{
         "terms":{
            "field":"SearchTypeName$$string"
         }
      },
      "SearchHitTypeName":{
         "terms":{
            "field":"SearchHitTypeName$$string"
         }
      },
      "All":{
         "filter":{
            "or":[
               {
                  "exists":{
                     "field":"SearchTitle$$string"
                  }
               },
               {
                  "not":{
                     "filter":{
                        "exists":{
                           "field":"SearchTitle$$string"
                        }
                     }
                  }
               }
            ]
         }
      }
   },
   "highlight":{
      "fields":{
         "SearchTitle$$string.sv":{
            "pre_tags":[
               "<strong>"
            ],
            "post_tags":[
               "</strong>"
            ],
            "number_of_fragments":0
         },
         "SearchSummary$$string.sv":{
            "pre_tags":[
               "<strong>"
            ],
            "post_tags":[
               "</strong>"
            ],
            "fragment_size":127,
            "number_of_fragments":2
         },
         "SearchText$$string.sv":{
            "pre_tags":[
               "<strong>"
            ],
            "post_tags":[
               "</strong>"
            ],
            "fragment_size":127,
            "number_of_fragments":2
         },
         "SearchAttachment$$attachment":{
            "pre_tags":[
               "<strong>"
            ],
            "post_tags":[
               "</strong>"
            ],
            "fragment_size":127,
            "number_of_fragments":2
         }
      }
   },
   "fields":[
      "___types",
      "$type",
      "SearchTitle$$string",
      "SearchHitUrl$$string",
      "SearchTypeName$$string",
      "SearchHitTypeName$$string",
      "SearchSection$$string",
      "SearchSubsection$$string",
      "SearchAuthors",
      "SearchPublishDate$$date",
      "SearchUpdateDate$$date",
      "SearchFilename$$string",
      "SearchFileExtension$$string",
      "SearchGeoLocation$$geo"
   ],
   "script_fields":{
      "SearchSummary$$string-cropped-255":{
         "script":"ascropped",
         "lang":"native",
         "params":{
            "field":"SearchSummary$$string",
            "length":255
         }
      },
      "SearchText$$string-cropped-255":{
         "script":"ascropped",
         "lang":"native",
         "params":{
            "field":"SearchText$$string",
            "length":255
         }
      }
   }
}

    

#63931
Dec 05, 2012 9:59
Vote:
 

Thanks!

You're right, there no best bet there. I'm not sure why it's not applied. Unless I'm mistaken the latest CMS 6 integration has criterias for language and such when adding best bets like the CMS 7 integration has. Perhaps it could be that such a criteria isn't met.

Let's see if one of the guys at EPi has some idea. Alternatively you could post your code as well and I'll gladly take a look.

#63943
Dec 05, 2012 12:23
Vote:
 

This is what I have now. I have tried moving the extension functions around to "chain" in different order.

var query = SearchClient.Instance.UnifiedSearchFor(this.QueryParameter, Language.Swedish)
        .TermsFacetFor(x => x.SearchTypeName)
        .TermsFacetFor(x => x.SearchHitTypeName)
        .FilterFacet("All", x => x.SearchTitle.Exists() | !x.SearchTitle.Exists())
        .Skip(pageIndex * this.Pager.PageSize)
        .Take(this.Pager.PageSize)
        .Track()
        .ApplyBestBets();

// Sort
if (!string.IsNullOrWhiteSpace(this.SortParameter))
{
    switch (this.SortParameter)
    {
        case "date":
            query = query.OrderByDescending(x => x.SearchPublishDate);
            break;
        case "title":
            query = query.OrderBy(x => x.SearchTitle);
            break;
    }
}
            
// Filter facet
if (!string.IsNullOrWhiteSpace(this.TypeNameParameter))
{
    query = query.FilterHits(x => x.SearchTypeName.Match(this.TypeNameParameter));
}

if (!string.IsNullOrWhiteSpace(this.HitTypeNameParameter))
{
    query = query.FilterHits(x => x.SearchHitTypeName.Match(this.HitTypeNameParameter));
}

// Get results
this.Results = query
        .GetResult(
            new HitSpecification
            {
                HighlightTitle = true,
                HighlightExcerpt = true,
                ExcerptLength = 255,
                PreTagForAllHighlights = "<strong>",
                PostTagForAllHighlights = "</strong>"
            });

    

#63951
Dec 05, 2012 12:56
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.