Is there any way of searching for data using sql-like 'distinct' on a specific field?
Lets say I have data structured like this:
Product: Red shirtSize: SCode: ABC_001Style: ABC
Product: Red shirtSize: MCode: ABC_001Style: ABC
Product: Blue shirtSize: SColor: BlueStyle: ABCCode: ABC_002
Product: Blue shirtSize: LColor: BlueStyle: ABCCode: ABC_002
I want to retrieve products by "Code", but only get one per "Code" - in other words "SIze" is not relevant. How can I do that using a Find query?
I see two possibilities:
1: Use facets. Instead of caring about the search results you retrieve a facet for "Product" (or ProductId).
2: Index objects that are already grouped.
Do you think either of those may solve your problem?
Some background for anyone interested can be found here. The gist of it is that distinct would require something called field collapsing and while technically possible it brings some problems with regards to performance and scaling. Therefor it's often better to either use facets or index the same original data multiple times as the number of documents in the index have very little impact on performance.
I sort of have it working now, but that is by grouping by Code after the search result is retrieved. But it will be better to only get the relevant items form Find, and not have to filter afterwards.
I think the facet aproach is promising, but I worry that I need more data than the facet key. I'll give it a try and come back to you.
The facet approach will be a bananazillion times faster than any resultset filtering. My guess is that you also might be able to construct a facet field value in a way that you can parse it afterwards like: name:somevalue:someothevalue as the some* values should be the same for every item in the group (othervise they wouldn't be distinct).
With assumption that above resulset is of variants and catalog structure is like this.
Product 1...Variant 1
You can add MetaDataType 'Product' in your search criteria to get back the results for products only and in result will be uniques ids for products.
Is this approach still supported in Find 8.X. I want to index Products and Variants with find but then I want Find to search through all products and variants but only return the matching distinct parent product objects.
An example code sniffet will be appreciated.
I am also interested to hear what is possibly based on Syed's post. I have data on the variant level that I want to be faceted on, but I want to show the counts for the facets based on a distinct of the parent products that are returned. Another possibility is if I could do the search based on the sku level, but then generate out the counts for the facets based on a distinct property of the results. So if I had 3 variants that belong to the same product, I could do a search that may perhaps match 2 of the 3 variants, but because some parent id is the same, only return a count of 1 for those sets of skus.