EPiServer Full Text Search Service

Product version:

EPiServer Framework 6.1 / 6.2

Document last saved:

 

Overview

The EPiServer FTS Service is a standalone WCF REST Service on top of an unmodified version of the open source search engine Lucene .Net (http://lucene.apache.org/lucene.net/) and uses Atom with extensions described in Atom and Extensions as data protocol. The service can be deployed from EPiServer Deployment Center and is automatically installed as part of a Relate+ installation. Services are hosted by the IIS and are invoked by the LuceneService.svc file, which means that multiple services can exists within the same environment. One service can also host multiple indexes as described in Named indexes and Multi search. A typical setup is shown below:

 Example setup

Assemblies and Namespaces

The EPiServer.Search.IndexingService assembly contains the following namespaces

  • EPiServer.Search.IndexingService
    Contains core classes for the EPiServer FTS Service, most notably are the NamedIndex, IndexingService and IndexingServiceHandler classes.
  • EPiSever.Search.IndexingService.Configuration
    Contains configuration classes for the EPiServer FTS Service.
  • EPiSever.Search.IndexingService.Security
    Contains core classes for client authentication
  • EPiSever.Search.IndexingService.IndexFieldSerializers
    Contains core classes for serializing and de-serializing  Atom feed items and Lucene Document Fields.

Service REST API

The EPiServer FTS Service defines the following REST endpoints:

Endpoint

Method

Comment

[baseUri]/update/
?accessKey={accessKey}

POST

Accepts an Atom feed with feed items to add to the index where:

“accessKey” is the connecting client access key

[baseUri]/search/
?q={q}
&namedindexes={namedindexes}
&format=xml
&offset={offset}
&limit={limit}
&accesskey={accesskey}

GET

Returns an Atom feed with feed items corresponding to hits in the index where:

“q” is the query expression to parse
“named indexes” a pipe separated list of indexes to search
“offset” is the offset from first result item
“limit” is the number of items to return from offset
“accessKey” is the connecting client access key

[baseUri]/reset/
?namedindex={namedindex}
&accessKey={accessKey}

POST

Wipes all content for the passed named index

[baseUri]/namedindexes/
?accesskey={accesskey}

GET

Returns an Atom feed with items corresponding to the configured named indexes for the service

Feature highlights

 

ReferenceId

When the ReferenceId extension contains a value, the EPiServer FTS Service will perceive this as not a standalone item, but an item that belongs to another item already in the index. An example is comments in EPiServer Community. The default behavior there is that when a comment is added to the index, the ReferenceId is set to the commented entity and when a query expression matched the contents of the comment, the commented entity will be returns as the IndexResponseItem and not the comment itself.

Any IndexRequestItem with IndexAction.Add and with a ReferenceId set, will be added to the reference index automatically created for all configured indexes with the suffix “_ref”. The search engine will internally find the parent item corresponding to the ReferenceId (by id) and update it so that all contents of any items in the “_ref” index for the parent item is added to the parent item searchable meta data. In this way we don’t need to send all data over and over when updating the entity client side. This means that you cannot search specific field in reference data since all default searchable fields chunked together and added to the main items metadata.

You cannot update the reference id property for an item once it’s added. All items with the IndexAction Update or Remove will automatically update its parent in the index if it was originally added with the reference id set.

Content in files will be indexed, the installed Ifilters will decide which files will be included.

DataUri

When the DataUri extension contains a value, the EPiServer FTS Service will immediately enqueue the whole request (in memory queue) . Dequeuening of a DataUri request results in a call to the Uri where the contents is concatenated to the IndexRequestItem.DisplayText and  IndexRequestItem.MetaData according to the configured value of maxDisplayTextLength.

The EPiServer FTS Service currently only supports File Uri’s and where the file needs to be accessible locally from the service perspective. The behavior may be overridden by overriding the GetFileUriContent(Uri) method and/or GetNonFileUriContent(Uri) in the IndexingServiceHandler.

­Named indexes and Multi search

The EPiServer FTS Service allows for configure multiple indexes which can be used where there is an obvious separation of indexed content or where the there is an existing (Lucene compatible) index that is updated from a different source. Multiple named indexes needs to be configured so that the fields for the index documents maps to the pre-defined field names in the service. See the TFS service configuration documentation.

When updating the index, the target index (by name) is specified in the “NamedIndex” attribute extension of the Atom formatted request. When searching the index, multiple indexes may be specified in the request and the EPiServer FTS Service will search each one of the specified index and return a merged result set. If no named index is specified, the default index will always be used.

VirtualPath

The VirtualPath feature enables structuring of indexed content in a tree structure where searches can be made under a certain tree node. This is accomplished by storing the literal path together with the index document and searching the index with a path with a trailing wildcard. The path always (when updating or searching) includes the full path from the root and up. For example: A document stored with the VirtualPath field set to “node1/node2” would be a valid hit when searching for documents with the path “node1” or “node1/node2”. However, it would not be considered if the path only specified “node2”.

Paging

Paging of search results can be made at the service or at the client. Default configuration (useIndexingServicePaging = true) states that the paging should be done in the service hence only returning max numbers if items equal to the passed pageSize. Changing this setting to client paging can potentially imply that the max numbers of items returned by the service is equal to the configured maxHitsFromIndexingService (default = 500). The client paging option may be considered when filter providers are configured, and paging needs to be intact.

Limitations

The EPiServer FTS Service only handles plain text and does not understand any markup language as HTML and cannot thus calculate relevance based on markup (such as <h1> <b> etc).

The EPiServer FTS Service does not do web crawling nor any automatic updates. All indexed content is pushed into the service and thus handing over all the responsibility of keeping the index updated to the client.

Third Party Search Engines

The loose couplings between the FTS Client and Service allows for third party search engines to implement solutions compatible with the FTS Client independent of platform. The only requirement is to comply with the REST service endpoint specifications. For .NET environments, this can be done by overriding the UpdateIndex, GetSearchResults, GetNamedIndexes and ResetNamedIndex methods in the IndexingServiceHandler, thus using the existing WCF REST service implementation.

Related Information

See also the following: