I'm building a EPiServer 8.5 + Find 9.2 site and I wonder how I can get at the metadata embedded in the uploaded files (pdf, doc, docx etc). The site has public search function that uses unified search, and for the MediaData hits I would like to display the title metadata from the file itself in the serp. By default UnifiedSearchHit.Title contains the filename.
I suppose I could implement SearchTitle and extract the metadata myself, but that seems silly since Find/Elastic is already doing that (I can find the file in the index explorer by searching for the metadata title).
I'm hoping there's a simple way to get the metadata!
UnifiedSearchHit.Title is fetched from the SearchTitle property, and that is, by default, the filename. You can manipulate this by adding a SearchTitle property on your MediaData model where you fetch/set the value to whatever suits you.
Yes I'm aware of how SearchTitle works, but to "fetch/set the value to whatever suits you" is not so trivial in this case and I was hoping that I somehow could take advantage of the fact that elastic has already done the heavy lifting with Apache Tika.
You can change the Find convention on startup to project the SearchTitle from a different property
.ProjectTitleFrom(x => x.Name)
This would need to be called in an initialisation module. This assumes you have a common base class across all of your uploaded / media files. In the example above this is the SiteMediaData class.
Thanks for chiming in, but the question really isn't how to set the Title, that's not a problem.
The word metadata might be confusing. By metadata I don't mean properties on a MediaData descendant, I mean the metadata (author, title etc) embedded in the actual file (.pdf, .docx etc).
Since Find has already extracted (using Apache Tika) and indexed this metadata, I hoped I would be able to get to it somehow instead of having to do that extraction myself a second time.
When indexing 'files' Find extracts and index metadata properties but currently the API don't support projections of metadata properties to UnifiedSearchHit properties.
Thanks for the clarification Henrik!
Did you resolve this in any way fsdf Adrup?
No, we skipped it. I actually asked Henrik about this just a month or two ago, and the situation seemed to be the same.
If you need to get at the metadata, I think your best bet is to index it yourself using IFilters or similar.
Thanks for your reply. I did something similar, indexed the field myself using iTextSharp
using (PdfReader reader = new PdfReader(BinaryData.OpenRead()))
if (reader != null && reader.Info["Title"] != null)
_metatitle = reader.Info["Title"];