Manticore - sphinxQL GROUP BY 重复分组 id
Manticore - sphinxQL GROUP BY duplicated grouped id
当我在 Manticore 中使用 GROUP BY 语法时,结果中有重复的分组 ID。我们刚刚从 sphinx 2.X 迁移到最新的 Manticore,并且在 Sphinx 中没有使用相同查询的问题。
这是 sphinxQL 查询:
SELECT model_id, model_root, model_name FROM search WHERE model_id != 0 GROUP BY model_root WITHIN GROUP ORDER BY model_level ASC ORDER BY model_level ASC, model_occurrence DESC, model_name ASC LIMIT 0, 13
所以将 model_root 分组,在 -> 10,11(Cannon)处有一个重复的键 -> 这不是我所期望的。
这是结果:
array:13 [▼
0 => array:3 [▼
"model_id" => "62763"
"model_root" => "62763"
"model_name" => "HP"
]
1 => array:3 [▼
"model_id" => "72771"
"model_root" => "72771"
"model_name" => "Sony"
]
2 => array:3 [▼
"model_id" => "72524"
"model_root" => "72524"
"model_name" => "Compaq"
]
3 => array:3 [▼
"model_id" => "62783"
"model_root" => "62783"
"model_name" => "Samsung"
]
4 => array:3 [▼
"model_id" => "62760"
"model_root" => "62760"
"model_name" => "Asus"
]
5 => array:3 [▼
"model_id" => "62761"
"model_root" => "62761"
"model_name" => "Toshiba"
]
6 => array:3 [▼
"model_id" => "85086"
"model_root" => "85086"
"model_name" => "Panasonic"
]
7 => array:3 [▼
"model_id" => "151763"
"model_root" => "151763"
"model_name" => "Acer"
]
8 => array:3 [▼
"model_id" => "72548"
"model_root" => "72548"
"model_name" => "Packard Bell"
]
9 => array:3 [▼
"model_id" => "62762"
"model_root" => "62762"
"model_name" => "Lenovo"
]
10 => array:3 [▼
"model_id" => "83072"
"model_root" => "83072"
"model_name" => "Canon"
]
11 => array:3 [▼
"model_id" => "83072"
"model_root" => "83072"
"model_name" => "Canon"
]
12 => array:3 [▼
"model_id" => "73476"
"model_root" => "73476"
"model_name" => "LG"
]
]
预期结果:
array:13 [▼
0 => array:3 [▼
"model_id" => "62763"
"model_root" => "62763"
"model_name" => "HP"
]
1 => array:3 [▼
"model_id" => "72771"
"model_root" => "72771"
"model_name" => "Sony"
]
2 => array:3 [▼
"model_id" => "72524"
"model_root" => "72524"
"model_name" => "Compaq"
]
3 => array:3 [▼
"model_id" => "62783"
"model_root" => "62783"
"model_name" => "Samsung"
]
4 => array:3 [▼
"model_id" => "62760"
"model_root" => "62760"
"model_name" => "Asus"
]
5 => array:3 [▼
"model_id" => "62761"
"model_root" => "62761"
"model_name" => "Toshiba"
]
6 => array:3 [▼
"model_id" => "85086"
"model_root" => "85086"
"model_name" => "Panasonic"
]
7 => array:3 [▼
"model_id" => "151763"
"model_root" => "151763"
"model_name" => "Acer"
]
8 => array:3 [▼
"model_id" => "72548"
"model_root" => "72548"
"model_name" => "Packard Bell"
]
9 => array:3 [▼
"model_id" => "62762"
"model_root" => "62762"
"model_name" => "Lenovo"
]
10 => array:3 [▼
"model_id" => "83072"
"model_root" => "83072"
"model_name" => "Canon"
]
11 => array:3 [▼
"model_id" => "73476"
"model_root" => "73476"
"model_name" => "LG"
]
12 => array:3 [▼
"model_id" => "73266"
"model_root" => "73266"
"model_name" => "Fujitsu"
]
]
这是索引定义:
index search
{
type = plain
source = search
path = /var/lib/manticore/data/search
min_word_len = 1
dict = keywords
min_prefix_len = 1
index_field_lengths = 1
charset_table = 0..9,non_cjk,-,.,/,"
}
并且在源定义中必填字段:
sql_attr_uint = model_id
sql_attr_uint = model_root
sql_field_string = model_name
知道查询或索引定义有什么问题吗?
我已经重现了你的问题。是的,Manticore 的行为在这种情况下有所不同,并且与 Sphinx 2.x 相比,默认 max_matches 值 (1000) 很可能是不够的。如果您提供的测试 max_matches=1025 应该足够了(而在 Sphinx 2.2 中它是 892)。在您的生产案例中,请自己试验最佳值。
请在此处阅读 max_matches 如何影响分组结果 https://docs.manticoresearch.com/latest/html/searching/grouping_clustering_search_results.html
当我在 Manticore 中使用 GROUP BY 语法时,结果中有重复的分组 ID。我们刚刚从 sphinx 2.X 迁移到最新的 Manticore,并且在 Sphinx 中没有使用相同查询的问题。
这是 sphinxQL 查询:
SELECT model_id, model_root, model_name FROM search WHERE model_id != 0 GROUP BY model_root WITHIN GROUP ORDER BY model_level ASC ORDER BY model_level ASC, model_occurrence DESC, model_name ASC LIMIT 0, 13
所以将 model_root 分组,在 -> 10,11(Cannon)处有一个重复的键 -> 这不是我所期望的。
这是结果:
array:13 [▼
0 => array:3 [▼
"model_id" => "62763"
"model_root" => "62763"
"model_name" => "HP"
]
1 => array:3 [▼
"model_id" => "72771"
"model_root" => "72771"
"model_name" => "Sony"
]
2 => array:3 [▼
"model_id" => "72524"
"model_root" => "72524"
"model_name" => "Compaq"
]
3 => array:3 [▼
"model_id" => "62783"
"model_root" => "62783"
"model_name" => "Samsung"
]
4 => array:3 [▼
"model_id" => "62760"
"model_root" => "62760"
"model_name" => "Asus"
]
5 => array:3 [▼
"model_id" => "62761"
"model_root" => "62761"
"model_name" => "Toshiba"
]
6 => array:3 [▼
"model_id" => "85086"
"model_root" => "85086"
"model_name" => "Panasonic"
]
7 => array:3 [▼
"model_id" => "151763"
"model_root" => "151763"
"model_name" => "Acer"
]
8 => array:3 [▼
"model_id" => "72548"
"model_root" => "72548"
"model_name" => "Packard Bell"
]
9 => array:3 [▼
"model_id" => "62762"
"model_root" => "62762"
"model_name" => "Lenovo"
]
10 => array:3 [▼
"model_id" => "83072"
"model_root" => "83072"
"model_name" => "Canon"
]
11 => array:3 [▼
"model_id" => "83072"
"model_root" => "83072"
"model_name" => "Canon"
]
12 => array:3 [▼
"model_id" => "73476"
"model_root" => "73476"
"model_name" => "LG"
]
]
预期结果:
array:13 [▼
0 => array:3 [▼
"model_id" => "62763"
"model_root" => "62763"
"model_name" => "HP"
]
1 => array:3 [▼
"model_id" => "72771"
"model_root" => "72771"
"model_name" => "Sony"
]
2 => array:3 [▼
"model_id" => "72524"
"model_root" => "72524"
"model_name" => "Compaq"
]
3 => array:3 [▼
"model_id" => "62783"
"model_root" => "62783"
"model_name" => "Samsung"
]
4 => array:3 [▼
"model_id" => "62760"
"model_root" => "62760"
"model_name" => "Asus"
]
5 => array:3 [▼
"model_id" => "62761"
"model_root" => "62761"
"model_name" => "Toshiba"
]
6 => array:3 [▼
"model_id" => "85086"
"model_root" => "85086"
"model_name" => "Panasonic"
]
7 => array:3 [▼
"model_id" => "151763"
"model_root" => "151763"
"model_name" => "Acer"
]
8 => array:3 [▼
"model_id" => "72548"
"model_root" => "72548"
"model_name" => "Packard Bell"
]
9 => array:3 [▼
"model_id" => "62762"
"model_root" => "62762"
"model_name" => "Lenovo"
]
10 => array:3 [▼
"model_id" => "83072"
"model_root" => "83072"
"model_name" => "Canon"
]
11 => array:3 [▼
"model_id" => "73476"
"model_root" => "73476"
"model_name" => "LG"
]
12 => array:3 [▼
"model_id" => "73266"
"model_root" => "73266"
"model_name" => "Fujitsu"
]
]
这是索引定义:
index search
{
type = plain
source = search
path = /var/lib/manticore/data/search
min_word_len = 1
dict = keywords
min_prefix_len = 1
index_field_lengths = 1
charset_table = 0..9,non_cjk,-,.,/,"
}
并且在源定义中必填字段:
sql_attr_uint = model_id
sql_attr_uint = model_root
sql_field_string = model_name
知道查询或索引定义有什么问题吗?
我已经重现了你的问题。是的,Manticore 的行为在这种情况下有所不同,并且与 Sphinx 2.x 相比,默认 max_matches 值 (1000) 很可能是不够的。如果您提供的测试 max_matches=1025 应该足够了(而在 Sphinx 2.2 中它是 892)。在您的生产案例中,请自己试验最佳值。
请在此处阅读 max_matches 如何影响分组结果 https://docs.manticoresearch.com/latest/html/searching/grouping_clustering_search_results.html