Five New Optimizely Certifications are Here! Validate your expertise and advance your career with our latest certification exams. Click here to find out more

How to display metadata from file in unified search?

Vote:
 

I'm building a EPiServer 8.5 + Find 9.2 site and I wonder how I can get at the metadata embedded in the uploaded files (pdf, doc, docx etc). The site has public search function that uses unified search, and for the MediaData hits I would like to display the title metadata from the file itself in the serp. By default UnifiedSearchHit.Title contains the filename.

I suppose I could implement SearchTitle and extract the metadata myself, but that seems silly since Find/Elastic is already doing that (I can find the file in the index explorer by searching for the metadata title).

I'm hoping there's a simple way to get the metadata!

#123370
Jul 02, 2015 14:26
Vote:
 

UnifiedSearchHit.Title is fetched from the SearchTitle property, and that is, by default, the filename. You can manipulate this by adding a SearchTitle property on your MediaData model where you fetch/set the value to whatever suits you.  

#131854
Aug 07, 2015 8:54
Vote:
 

Yes I'm aware of how SearchTitle works, but to "fetch/set the value to whatever suits you" is not so trivial in this case and I was hoping that I somehow could take advantage of the fact that elastic has already done the heavy lifting with Apache Tika.

#132670
Aug 17, 2015 10:11
Vote:
 

You can change the Find convention on startup to project the SearchTitle from a different property

SearchClient.Instance.Conventions.UnifiedSearchRegistry
                .Add<SiteMediaData>()
                .ProjectTitleFrom(x => x.Name)

This would need to be called in an initialisation module. This assumes you have a common base class across all of your uploaded / media files. In the example above this is the SiteMediaData class.

#132695
Aug 17, 2015 15:40
Vote:
 

Thanks for chiming in, but the question really isn't how to set the Title, that's not a problem.

The word metadata might be confusing. By metadata I don't mean properties on a MediaData descendant, I mean the metadata (author, title etc) embedded in the actual file (.pdf, .docx etc).

Since Find has already extracted (using Apache Tika) and indexed this metadata, I hoped I would be able to get to it somehow instead of having to do that extraction myself a second time.

#132703
Aug 17, 2015 16:47
Vote:
 

When indexing 'files' Find extracts and index metadata properties but currently the API don't support projections of metadata properties to UnifiedSearchHit properties.

#132876
Aug 19, 2015 20:54
Vote:
 

Thanks for the clarification Henrik!

#132877
Aug 19, 2015 21:01
Vote:
 

Did you resolve this in any way fsdf Adrup?

#188820
Mar 05, 2018 10:16
Vote:
 

No, we skipped it. I actually asked Henrik about this just a month or two ago, and the situation seemed to be the same.

If you need to get at the metadata, I think your best bet is to index it yourself using IFilters or similar.

#188843
Mar 05, 2018 16:39
Vote:
 

Hi

Thanks for your reply. I did something similar, indexed the field myself using iTextSharp

using (PdfReader reader = new PdfReader(BinaryData.OpenRead()))
                {
                    try
                    {
                        if (reader != null && reader.Info["Title"] != null)
                        {
                            _metatitle = reader.Info["Title"];
                            return _metatitle;
                        }
                    }
                    catch (Exception)
                    {
                        //Ignore
                    }
                }



#188858
Mar 06, 2018 8:33
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.