Paras Diwan Paras Diwan - 1 month ago 26
JSON Question

Solr grouped query pagination not working properly. [Solr, Lucene]

I have grouped my solr documents by a field

family
.
the solr query for getting first 20 groups is as follows

/select?q=*:*&group=true&group.field=family&group.ngroups=true&start=0&group.limit=1


Result of this query is 20 groups as following

responseHeader: {
zkConnected: true,
status: 0,
QTime: 1260,
params: {
q: "*:*",
group.limit: "1",
start: "0",
group.ngroups: "true",
group.field: "family",
group: "true"
}
},
grouped: {
family: {
matches: 464779,
ngroups: 396324,
groups: [
{
groupValue: "__fam__ME.EA.HE.728928",
doclist: {
numFound: 1,
start: 0,
maxScore: 1,
docs: [
{
sku: "ME.EA.HE.728928",
title: "Rexton Pocket Family Hearing Instrument Fusion",
family: "__fam__ME.EA.HE.728928",
brand: "Rexton",
brandId: "6739",
inStock: false,
bulkDiscount: false,
quoteOnly: false,
cats: [
"Hearing Machine & Components",
"Health & Personal Care",
"Medical Supplies & Equipment"
],
leafCatIds: [
"6038"
],
parentCatIds: [
"6259",
"4913"
],
Type__attr__: "Pocket Family",
Type of Products__attr__: "Hearing Instrument",
price: 3790,
discount: 40,
createdAt: "2016-02-18T04:51:36Z",
moq: 1,
offerPrice: 2255,
suggestKeywords: [
"Rexton",
"Pocket Family",
"Rexton Pocket Family"
],
suggestPayload: "6038,Hearing Machine & Components",
_version_: 1548082328946868200
}
]
}
},


Just the thing to notice in this result is the value of ngroups which is
396324


But when i want to get data of last pages i would hit this query on Solr

select?q=*:*&group=true&group.field=family&group.ngroups=true&start=396320&group.limit=1


{
responseHeader: {
zkConnected: true,
status: 0,
QTime: 5238,
params: {
q: "*:*",
group.limit: "1",
start: "396320",
group.ngroups: "true",
group.field: "family",
group: "true"
}
},
grouped: {
family: {
matches: 464779,
ngroups: 396324,
groups: [ ]
}
}
}


0 results when i set start to
396320
. There must be 5 documents in the result. The actual number of groups are
386887
. Why is ngroups incorrect?

btw this issue is not present in my local solr server i have setup up. just shows up in solr cloud on the test env

Answer

This is a known issue with grouping across distributed nodes (which is what happens in SolrCloud mode):

Grouping is supported for distributed searches, with some caveats:

Currently group.func is is not supported in any distributed searches

group.ngroups and group.facet require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. Document routing via composite keys can be a useful solution in many situations.

The most direct solution is to use the family as a part of the routing key, ensuring that all identical family values will end up on the same node. As it seems that the number of distinct family values are very high compared to the number of nodes, this should still ensure that you have a good distribution of documents across nodes.

Depending on what you're actually trying to do, there might be other alternative solutions as well (if you just want a count, using a JSON facet might be a good solution).