To Highlight on all attachments in a non typed way you can use the following gist: Highlighting all attachment fields
/Henrik
Hi Henrik,
Thanks for the tip, but can you provide some explanation of how use the Find Attachments feature in these circumstances?
As per my original post, the documentation on Attachments makes it sound like this is only for files hosted outside EPiServer rather than for IContent items. We need users to be able to upload documents into the CMS media tree as they usually would, and drop an arbitrary number of these into a content area (or similar) for inclusion on the page.
Cheers,
Alex
As for searching for an abritary number of attachment in a content area of perhaps different media types I think the solution works ok.
In the general case you can query IContentMedia as any other IContent and access the attachment data of the actual file using the .SearchAttachment()-extension.
/Henrik
When you say "the solution works OK", do you mean that what we've currently got (matching text in attachments, but not being able to highlight or boost) is probably as good as we're going to get here?
I don't think I follow your postscript about using SearchAttachment - I can't find any documentation for this, other than this stub, which doesn't shed much light. How could we use this to improve our solution?
Is there any method which lets us, if we have an IContent object, use EPiServer Find to do a full-text search (with highlighting) on that object, assuming it's indexed?
Alex
I posted a gist with an example on how to highlight. As for boosting you need to be able to specify specific fields and for attachments this cannot be done as the entire attachment is parsed as a single field (ie. you cant boos't titles/headers within a document).
If you have a IContent object you can highlight that as it is done in the documentation. As for IContentMedia what I meant is that if you want to highlight the actual content of the file what you do is:
searchResult = client.Search<MyMediaData>() .For("Banana") .Select(x => new { HighlightedAttachment = x.SearchAttachment().AsHighlighted() }) .GetResult();
/Henrik
I found a way to do this, in the end. Per's suggestion in this post, and the subsequent discussion on that thread, put me on the right track.
In the unlikely event that anyone else hits this, here's how it works:
It seems to work. Not sure how it would scale, but we're only dealing with ~40 instances of FooPage. There's some scope to add cacheing in there if required.
Alex
We have a client requirement to have a page type which acts as a collection for multiple documents - one primary document (a PDF), and an arbitrary number of secondary documents (PDFs, DOCX/XLSX files, images). The documents are all stored in CMS as IContent. The clients want a search control on the page which can match against text in any of these documents, and return the parent page as the result. The individual files themselves should not be returned as search results. This doesn't need to be a site-wide search, and can be specific to the collection page type.
Is there any decent way to do this? We're on EPiServer 7.19/Find 8, and can't update at this stage.
I have a working-but-not-great solution, achieved by using Content Areas to hold the documents, and enabling IndexInContentAreas on the MediaData files. If I then use:
I get pages based on the text in these content areas - I've tested this with unique words in PDFs, and it definitely works. Great! Unfortunately, I can't see any way of communicating to the user why the match was made - I've tried creating highlights as per the documentation here, but this just returns an empty string, and there is no available excerpt for this search type.
My first guess was to use UnifiedSearch and grab the Excerpt and Highlight from there, but for whatever reason, UnifiedSearch does not match against words from the PDFs in content areas in the way conventional search does.
I've seen the documentation around attachments, but this is a bit abstract and I can't find an example of how this can be used as part of a Page, and whether it can accept IContent MediaData, or if it's meant solely for indexing binary content that sits outside of EPiServer's own media tree. Would I need to intercept the publish action for my page and create/update an instance of a POCO container class which links the page to the attachments for EPiServer Find to index?