Solr 分组查询分页无法正常工作。 [Solr、Lucene]
Solr grouped query pagination not working properly. [Solr, Lucene]
我按字段 family
对我的 solr 文档进行了分组。
获取前 20 个组的 solr 查询如下
/select?q=*:*&group=true&group.field=family&group.ngroups=true&start=0&group.limit=1
本次查询结果为20组如下
responseHeader: {
zkConnected: true,
status: 0,
QTime: 1260,
params: {
q: "*:*",
group.limit: "1",
start: "0",
group.ngroups: "true",
group.field: "family",
group: "true"
}
},
grouped: {
family: {
matches: 464779,
ngroups: 396324,
groups: [
{
groupValue: "__fam__ME.EA.HE.728928",
doclist: {
numFound: 1,
start: 0,
maxScore: 1,
docs: [
{
sku: "ME.EA.HE.728928",
title: "Rexton Pocket Family Hearing Instrument Fusion",
family: "__fam__ME.EA.HE.728928",
brand: "Rexton",
brandId: "6739",
inStock: false,
bulkDiscount: false,
quoteOnly: false,
cats: [
"Hearing Machine & Components",
"Health & Personal Care",
"Medical Supplies & Equipment"
],
leafCatIds: [
"6038"
],
parentCatIds: [
"6259",
"4913"
],
Type__attr__: "Pocket Family",
Type of Products__attr__: "Hearing Instrument",
price: 3790,
discount: 40,
createdAt: "2016-02-18T04:51:36Z",
moq: 1,
offerPrice: 2255,
suggestKeywords: [
"Rexton",
"Pocket Family",
"Rexton Pocket Family"
],
suggestPayload: "6038,Hearing Machine & Components",
_version_: 1548082328946868200
}
]
}
},
在此结果中需要注意的是 ngroups 的值,即 396324
但是当我想获取最后一页的数据时,我会在 Solr 上点击这个查询
select?q=*:*&group=true&group.field=family&group.ngroups=true&start=396320&group.limit=1
{
responseHeader: {
zkConnected: true,
status: 0,
QTime: 5238,
params: {
q: "*:*",
group.limit: "1",
start: "396320",
group.ngroups: "true",
group.field: "family",
group: "true"
}
},
grouped: {
family: {
matches: 464779,
ngroups: 396324,
groups: [ ]
}
}
}
当我将开始设置为 396320
时,结果为 0。结果中必须有 5 个文档。实际组数386887
。为什么 ngroups 不正确?
顺便说一句,这个问题不存在于我设置的本地 solr 服务器中。刚刚出现在测试环境的 solr 云中
这是 grouping across distributed nodes 的一个已知问题(在 SolrCloud 模式下会发生这种情况):
Grouping is supported for distributed searches, with some caveats:
Currently group.func is is not supported in any distributed searches
group.ngroups
and group.facet
require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. Document routing via composite keys can be a useful solution in many situations.
最直接的解决方案是使用家族作为路由键的一部分,确保所有相同的家族值最终都在同一个节点上。由于与节点数量相比,不同系列值的数量似乎非常多,因此这仍应确保您在节点之间有良好的文档分布。
根据您实际尝试做的事情,可能还有其他替代解决方案(如果您只想计数,使用 JSON facet 可能是一个不错的解决方案)。
我按字段 family
对我的 solr 文档进行了分组。
获取前 20 个组的 solr 查询如下
/select?q=*:*&group=true&group.field=family&group.ngroups=true&start=0&group.limit=1
本次查询结果为20组如下
responseHeader: {
zkConnected: true,
status: 0,
QTime: 1260,
params: {
q: "*:*",
group.limit: "1",
start: "0",
group.ngroups: "true",
group.field: "family",
group: "true"
}
},
grouped: {
family: {
matches: 464779,
ngroups: 396324,
groups: [
{
groupValue: "__fam__ME.EA.HE.728928",
doclist: {
numFound: 1,
start: 0,
maxScore: 1,
docs: [
{
sku: "ME.EA.HE.728928",
title: "Rexton Pocket Family Hearing Instrument Fusion",
family: "__fam__ME.EA.HE.728928",
brand: "Rexton",
brandId: "6739",
inStock: false,
bulkDiscount: false,
quoteOnly: false,
cats: [
"Hearing Machine & Components",
"Health & Personal Care",
"Medical Supplies & Equipment"
],
leafCatIds: [
"6038"
],
parentCatIds: [
"6259",
"4913"
],
Type__attr__: "Pocket Family",
Type of Products__attr__: "Hearing Instrument",
price: 3790,
discount: 40,
createdAt: "2016-02-18T04:51:36Z",
moq: 1,
offerPrice: 2255,
suggestKeywords: [
"Rexton",
"Pocket Family",
"Rexton Pocket Family"
],
suggestPayload: "6038,Hearing Machine & Components",
_version_: 1548082328946868200
}
]
}
},
在此结果中需要注意的是 ngroups 的值,即 396324
但是当我想获取最后一页的数据时,我会在 Solr 上点击这个查询
select?q=*:*&group=true&group.field=family&group.ngroups=true&start=396320&group.limit=1
{
responseHeader: {
zkConnected: true,
status: 0,
QTime: 5238,
params: {
q: "*:*",
group.limit: "1",
start: "396320",
group.ngroups: "true",
group.field: "family",
group: "true"
}
},
grouped: {
family: {
matches: 464779,
ngroups: 396324,
groups: [ ]
}
}
}
当我将开始设置为 396320
时,结果为 0。结果中必须有 5 个文档。实际组数386887
。为什么 ngroups 不正确?
顺便说一句,这个问题不存在于我设置的本地 solr 服务器中。刚刚出现在测试环境的 solr 云中
这是 grouping across distributed nodes 的一个已知问题(在 SolrCloud 模式下会发生这种情况):
Grouping is supported for distributed searches, with some caveats:
Currently group.func is is not supported in any distributed searches
group.ngroups
andgroup.facet
require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. Document routing via composite keys can be a useful solution in many situations.
最直接的解决方案是使用家族作为路由键的一部分,确保所有相同的家族值最终都在同一个节点上。由于与节点数量相比,不同系列值的数量似乎非常多,因此这仍应确保您在节点之间有良好的文档分布。
根据您实际尝试做的事情,可能还有其他替代解决方案(如果您只想计数,使用 JSON facet 可能是一个不错的解决方案)。