I'm going to use a Forum as an example. If I was developing this without EPiServer or a CMS, I would have to create custom database and tables, and then index into elasticsearch. What I'm trying to determine is if it's not just feasible, but more efficient to use EPiServer as the datastore, and leverage the inbuilt index mapping to Find.
Essentially we would have a list of forum pages that registered users could create whenever they liked, and on each forum page, there are obviously comments that can be made by different people. There would also be a profile page where you would see forum activity, and people could also comment on you profile "wall".
So, using EPiServer, would it be better to store the main forum pages as one page type, then store comments in a separate section and provide a custom field id link back to the original forum? Essentially it would be creating different "indexes" that are separate, but can be brought together via search queries with Find.
I am looking at storing hundreds of thousands of items, so performance is key. Obviously Find is a great solution, I'm more thinking about whether to store the user generated data in EPiServer completely, or have custom tables and then index those objects into Find manually?
There is a CMS element whereby an administrator can page through forums, results, but this woul dnot be efficient through the content tree with the number of items, so I would have to create custom admin pages (which is fine).
Any thoughts/ideas on this are greatly appreciated.
I'm not sure I fully understand your question. Are you thinking about using Find as the only storage for some objects (comments?) or are you contemplating different ways to build the forum using storage either as pages or as custom objects stored in a database or the DDS and want ideas about how to index these objects?
Either way, should users be able to search the forum for threads or individual comments/posts?
Because the majority of the content of the site would be user generated, I'm looking for the best data store for all user generated content. There will also be some content that will be generated by admin (CMS), for example a set list of "category" forum pages. I'm looking at all options for storage, and trying to determine the best method as either pages or as custom objects.
Obviously with Find, pages are automatically indexed, so to be able to have that functionality out-of-the-box is quite advantageous. With the new EPiServer 7 data architecture, I'm assuming that it would not even be a question, as this would be an excellent example of new data store.
All I want to use Find for is the search of forum pages, user profiles, and potentially going into comments as well.
Alright, so somewhat simplified you have these options:
In order to build search/querying functionality with Find objects needs to be indexed. This involves:
Of course given that reindexing functionality has been built it's possible to skip having event driven indexing but then the search functionality won't be updated in real-time.
Given this, what does this mean for each of the data store alternatives?
If user generated content is stored as CMS content the first and last of the steps needed for indexing is taken care of. Some work may be needed to ensure that for instance comments/replies are indexed along with their threads meaning that some extension method may be needed for the thread page/content type and we'd have to ensure that threads are reindexed when a reply is added to them. The latter could be done using the "related pages" functionality in Find.
If user generated content is stored using EPiServer Relate/Community we'd have to exclude some properties that cause circular references from forum threads and replies. We'd also have to implement all of the other bullet points for indexing. However, doing so would probably be fairly easy as we wouldn't have to deal with all Community entity types but only a few specific ones and (as I recall) Community exposes the relevant events. Re-indexing functionality could be built using a fairly simple scheduled job.
If user generated content is stored in some sort of custom data layer we'd most likely wouldn't have any problems ensuring that the objects could be indexed as the types are likely to be fairly simple. If we were using some O/R-mapper like Entity Framework that uses dynamic proxy objects we'd have to instruct Find to index the objects as their non-proxy types however. We'd of course have to build the indexing functionality ourselves but as we're in control of the CRUD functionality this should be fairly easy. Reindexing functionality could, again, be built using a fairly simple scheduled job.
In a schenario where we're storing some objects as CMS content and some using a custom data layer we'd probably be able to utilize the existing reindexing functionality. We'd have to ensure that replies/comments are indexed with their threads.
So, storing user generated content as CMS content gives us a lot of indexing functionality for free but we'd probably have to do some work to ensure that replies/comments are indexed with their threads. The other alternatives involves more work as we'd have to build the indexing functionality ourselves. However, building this functionality doesn't have to take very long. For instance one project that I worked with stored comments in the DDS and we created both event driven indexing and re-indexing functionality in about an hour or two.
With that in mind my recommendation would be to choose the best solution for storing user generated content without regarding indexing as, no matter what alternative you choose building indexing functionality is likely to be a very small percentage of the work. You may want to consider if Find can help you solve some problems that you'd otherwise face using some of the storage alternatives. For instance, if you'd store the content as CMS pages/content you may have trouble building non-hierarchial listing and Find could certainly help you there.
I'm not sure wether my reply helps you. Please let me know if it doesn't :)
Thanks Joel, that is some awesome information, and gives me plenty to think about.
In terms of storage and mapping comments to threads, my idea would be that if I'm storing all content as pages (to keep the idea of easy to index going), I would have a field on a comment item that would be the reference map back to the original thread. That way I could store the comments in its own section of the tree, i.e. seperate from the threads. (Not sure if there's any benefit at this stage with that approach).
Then when performing a search on returning all comments for a thread, I can just filter out the comments based on the "reference" field. The idea behind this is that it will not only be threads that would have comments. Consider that a user profile could have comments etc.
I think you are right though in that putting indexing aside, I should use the best data store for the application, not for the index, as either way, there is not alot of work involved to do this.
Mentioning EPiServer 7 again, this would not pose as significant of a query/answer correct?
I'm afraid I don't follow how you envision searching for threads and comments completely, but keep in mind that what you search for with Find is documents. Out-of-the-box one object == one document. Meaning that if each comment is indexed on its own you can easily search for individual comments. You can then of course link from a found comment to its thread, but you won't be able to search for threads based on all of their comments. To handle that you can include the comments in each threads document using for instance an extension method that returns all comments. In other words, you can't do joins but you can denormalize your data.
I don't think EPiServer 7 will have any significant impact on this with the exception that you could store user generated data as a custom content type that doesn't show in the page tree.