Using Azure Search as search solution in EPiServer
Microsoft released a new preview search solution in Azure a little while ago and this post is about how to use it in an EPiServer website. I try this out because that when deploying an EPiServer site to Azure as an Azure website the built in EPiServer Search does not work and it feels bad to not be able to search.
Azure search service is available to all with an azure account and right now there are two versions/levels of it, free and standard. Free is as it sounds free of charge for all and has these limitations.
· Up to 3 indexes
· Up to 10 000 documents
· Up to 50 MB of storage
· No scaling
· Shared environment
Standard cost right now 125 USD/month and has these limitations.
· Up to 50 indexes
· Up to 15 000 000 documents
· Up to 300 GB storage
· Dedicated environment
· Possibility to scale up to max 35 units
Read more about price and limitations here: http://azure.microsoft.com/en-us/pricing/details/search/
For most websites 10 000 documents/pages is enough and the two limitations that can be a problem is the storage and number of indexes. The storage is enough if you do not have pages with a lot of data in them but 50 MB might be consumed pretty fast. For the number of indexes it might seems a lot with three index but Azure search service does not see index as for example EPiServer Find does. In Azure search service an index is a type of object so if you want to be able to do more advanced searches for different kind of pagetypes it might demand that you will have one index per pagetype and then three is a very small number.
How to use it
You first have to create your search service in the Azure preview portal, read more about it here: http://azure.microsoft.com/en-us/documentation/articles/search-configure/
Create index
After that you have to think a lot of what you want to index since every kind of object is its own index with its own http connection and so on. In this first try I will keep it very simple and only use one index with information from pages in a compressed way. The object looks like this:
var index = new
{
name = "pages",
fields = new[]
{
new { name = "id", type = "Edm.Int32", key = true },
new { name = "name", type = "Edm.String", key = false },
new { name = "linkurl", type = "Edm.String", key = false },
new { name = "metatitle", type = "Edm.String", key = false },
new { name = "metadescription ", type = "Edm.String", key = false },
new { name = "teasertext", type = "Edm.String", key = false },
new { name = "mainbody", type = "Edm.String", key = false },
new { name = "contenttypeid", type = "Edm.Int32", key = false },
}
};
Azure search services support some EDM (Entity Data Model) data types in index and document, read more about it here: http://msdn.microsoft.com/en-us/library/azure/dn798938.aspx
As you can see this is absolutely not all information needed in a full scale search for EPiServer and I can already now not see this as a fully acceptable replacement for EPiServer Search in Azure but I will go on just to test it out.
To make it simple I will not implement logic that listening for publish page event, just do a schedule task that will iterate though all pages and add them to the index. The code will be on github so you can extend it yourself if you want.
Populate index
The best way to do this in a live version is to publish updates when a user press publish but for this test I am only going to populate the index with a schedule task. I create a task where I goes through all pages in the site for the pagetypes that are interesting and then I map that page to a class that I serialize and send to the index. I am doing it per pagetype because I do not know how Azure search service handles a very big request. The schedule job looks like this:
[ScheduledPlugIn(DisplayName = "[Azure Search] Update index", Description = "Update the index with all published pages")]
public class UpdateAzureSearchService
{
public static string Execute()
{
var totalStopWatch = new Stopwatch();
totalStopWatch.Start();
IndexPageType(typeof(StartPage).GetPageType());
IndexPageType(typeof(ArticlePage).GetPageType());
IndexPageType(typeof(NewsPage).GetPageType());
IndexPageType(typeof(ProductPage).GetPageType());
IndexPageType(typeof(StandardPage).GetPageType());
totalStopWatch.Stop();
return string.Format("Azure search service updated. Time taken: {0}", totalStopWatch.Elapsed);
}
private static void IndexPageType(ContentType contentType)
{
var client = new HttpClient();
client.DefaultRequestHeaders.Add("api-key", ConfigurationManager.AppSettings["AzureSearchServiceApiKey"]);
var pages = FilterForVisitor.Filter(DataFactory.Instance.GetAllPagesOfCertainPageType(contentType.ID, ContentReference.RootPage));
if (pages == null) return;
var pagesToUpdateObject = new UpdateAzureSearch
{
value = pages.Select(page => page.MapContentToAzureSearchServiceObject()).ToList()
};
var response = client.PostAsync(ConfigurationManager.AppSettings["AzureSearchServiceRootUrl"] + "/indexes/pages/docs/index?api-version=2014-07-31-Preview", new StringContent(JsonConvert.SerializeObject(pagesToUpdateObject), Encoding.UTF8, "application/json")).Result;
response.EnsureSuccessStatusCode();
}
}
GetAllPagesOfCertainPageType is a extension that looks like this:
public static PageDataCollection GetAllPagesOfCertainPageType(this DataFactory datafactory, int pageTypeDefinitionId, PageReference root)
{
var propertyCriteriaCollection = new PropertyCriteriaCollection
{
new PropertyCriteria
{
Condition = CompareCondition.Equal,
Name = "PageTypeID",
Type = PropertyDataType.PageType,
Value = pageTypeDefinitionId.ToString(CultureInfo.InvariantCulture),
Required = true
}
};
var pagesOfCorrectType = DataFactory.Instance.FindPagesWithCriteria(root, propertyCriteriaCollection);
return pagesOfCorrectType;
}
And MapContentToAzureSearchServiceObject is a extension method that looks like this:
public static AzureSearchServiceObject MapContentToAzureSearchServiceObject(this PageData pageData)
{
return new AzureSearchServiceObject
{
id = pageData.ContentLink.ID.ToString(CultureInfo.InvariantCulture),
contenttypeid = pageData.ContentTypeID,
name = pageData.Name,
mainbody = pageData.GetPropertyValue("MainBody", string.Empty),
linkurl = pageData.LinkURL,
metadescription = pageData.GetPropertyValue("MetaDescription", string.Empty),
metatitle = pageData.GetPropertyValue("MetaTitle", string.Empty),
teasertext = pageData.GetPropertyValue("TeaserText", string.Empty)
};
}
I am using a wrapper object called UpdateAzureSearch that only has one property called value that is a list of AzureSearchServiceObject because Azure search services demands that the json sent to it looks like that. The objects are defined like this:
public class AzureSearchServiceObject
{
public string id { get; set; }
public string name { get; set; }
public string linkurl { get; set; }
public string metatitle { get; set; }
public string metadescription { get; set; }
public string teasertext { get; set; }
public string mainbody { get; set; }
public Int32 contenttypeid { get; set; }
}
public class UpdateAzureSearch
{
public List<AzureSearchServiceObject> value { get; set; }
}
Use the index
This testsite is created with the alloy template and in that there are a searchpage already created so all I did was to create my own SearchService and removed the one that are using EPiServer Search. I made it so simple as possible and are only doing a plain text search with no steeming and so on. Azure search services is built on Elastic Search so there are a lot of possibilities for more advanced queries but this is only a Proof Of Concept so I keep it as simple as possible. The SearchService looks like this:
I then simplified the search controller so it looks like this:
public class SearchPageController : PageControllerBase<SearchPage>
{
private readonly AzureSearchService _searchService;
public SearchPageController(
AzureSearchService searchService)
{
_searchService = searchService;
}
[ValidateInput(false)]
public ViewResult Index(SearchPage currentPage, string q)
{
var model = new SearchContentModel(currentPage)
{
SearchServiceDisabled = false,
SearchedQuery = q
};
if (!string.IsNullOrWhiteSpace(q))
{
var hits = Search(q.Trim()).ToList();
model.Hits = hits;
model.NumberOfHits = hits.Count();
}
return View(model);
}
private IEnumerable<SearchContentModel.SearchHit> Search(string searchText)
{
return _searchService.Search(searchText);
}
}
Conclusion
This is a preview service from Azure and there are some things that can be better, like not handling a objecttype as an index but it is absolutely working and for simple sites it can be a solution but for bigger sites it is better to pick up the wallet and pay for EPiServer Find.
If you like to test for yourself you can download the code from here: https://github.com/hesta96/AzureSearchServicesTest
To make the code work all you have to do is to create your own Azure Search Solution in Azure Preview Portal (https://portal.azure.com) and update web.config with your url and key. The database is included in the project and all blobs are saved in the database so it should just be to press F5 after that.
Why are Windows Live Writer so dam hard to use?????
Agree, Live writer sucks and is usually making me say a lot words that are better left unsaid.
Linus, it ended up with me having to change the html-code by hand and yes there were some ugly words.
It took nearly the same amount of time to create and make the blog post look ok as it took to write all the testcode that I blogged about,,,,
Interesting!
Interesting. Must say that the price plan differences for Azure Search vs Episerver Find makes me wonder if we should go for Azure Search.
Can someone say what features in Find makes it worth to go for Find?
(The difference/limitation in what an Index is for Azure Search is not a problem for us. I think we can have the samr properties/fields for all our pages/documents)