How to display metadata from file in unified search?

Vote:
 

I'm building a EPiServer 8.5 + Find 9.2 site and I wonder how I can get at the metadata embedded in the uploaded files (pdf, doc, docx etc). The site has public search function that uses unified search, and for the MediaData hits I would like to display the title metadata from the file itself in the serp. By default UnifiedSearchHit.Title contains the filename.

I suppose I could implement SearchTitle and extract the metadata myself, but that seems silly since Find/Elastic is already doing that (I can find the file in the index explorer by searching for the metadata title).

I'm hoping there's a simple way to get the metadata!

#123370
Jul 02, 2015 14:26
Vote:
 

UnifiedSearchHit.Title is fetched from the SearchTitle property, and that is, by default, the filename. You can manipulate this by adding a SearchTitle property on your MediaData model where you fetch/set the value to whatever suits you.  

#131854
Aug 07, 2015 8:54
Vote:
 

Yes I'm aware of how SearchTitle works, but to "fetch/set the value to whatever suits you" is not so trivial in this case and I was hoping that I somehow could take advantage of the fact that elastic has already done the heavy lifting with Apache Tika.

#132670
Aug 17, 2015 10:11
Vote:
 

You can change the Find convention on startup to project the SearchTitle from a different property

SearchClient.Instance.Conventions.UnifiedSearchRegistry
                .Add<SiteMediaData>()
                .ProjectTitleFrom(x => x.Name)

This would need to be called in an initialisation module. This assumes you have a common base class across all of your uploaded / media files. In the example above this is the SiteMediaData class.

#132695
Aug 17, 2015 15:40
Vote:
 

Thanks for chiming in, but the question really isn't how to set the Title, that's not a problem.

The word metadata might be confusing. By metadata I don't mean properties on a MediaData descendant, I mean the metadata (author, title etc) embedded in the actual file (.pdf, .docx etc).

Since Find has already extracted (using Apache Tika) and indexed this metadata, I hoped I would be able to get to it somehow instead of having to do that extraction myself a second time.

#132703
Aug 17, 2015 16:47
Vote:
 

When indexing 'files' Find extracts and index metadata properties but currently the API don't support projections of metadata properties to UnifiedSearchHit properties.

#132876
Aug 19, 2015 20:54
Vote:
 

Thanks for the clarification Henrik!

#132877
Aug 19, 2015 21:01
Vote:
 

Did you resolve this in any way fsdf Adrup?

#188820
Mar 05, 2018 10:16
Vote:
 

No, we skipped it. I actually asked Henrik about this just a month or two ago, and the situation seemed to be the same.

If you need to get at the metadata, I think your best bet is to index it yourself using IFilters or similar.

#188843
Mar 05, 2018 16:39
Vote:
 

Hi

Thanks for your reply. I did something similar, indexed the field myself using iTextSharp

using (PdfReader reader = new PdfReader(BinaryData.OpenRead()))
                {
                    try
                    {
                        if (reader != null && reader.Info["Title"] != null)
                        {
                            _metatitle = reader.Info["Title"];
                            return _metatitle;
                        }
                    }
                    catch (Exception)
                    {
                        //Ignore
                    }
                }



#188858
Mar 06, 2018 8:33
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.