Elasticsearch 或查询逗号分隔值
Elasticseach or query for comma separated values
我在数据库中将 id 保存为逗号分隔并将其索引到 ElasticSearch。现在我需要检索 user_id 是否与值匹配。
例如,它在列 user_ids 的索引中像这样保存(数据库类型在 elasticsearch 中是 varchar(500),它是文本)
8938,8936,8937
$userId = 8936; // For example expecting to return that row
$whereCondition = [];
$whereCondition[] = [
"query_string" => [
"query"=> $userId,
"default_field" => "user_ids",
"default_operator" => "OR"
]
];
$searchParams = [
'query' => [
'bool' => [
'must' => [
$whereCondition
],
'must_not' => [
['exists' => ['field' => 'deleted_at']]
]
]
],
"size" => 10000
];
User::search($searchParams);
Json查询
{
"query": {
"bool": {
"must": [
[{
"query_string": {
"query": 8936,
"default_field": "user_ids",
"default_operator": "OR"
}
}]
],
"must_not": [
[{
"exists": {
"field": "deleted_at"
}
}]
]
}
},
"size": 10000
}
映射详细信息
{
"user_details_index": {
"aliases": {},
"mappings": {
"test_type": {
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"deleted_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"updated_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"user_ids": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"settings": {
"index": {
"creation_date": "1546404165500",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "krpph26NTv2ykt6xE05klQ",
"version": {
"created": "6020299"
},
"provided_name": "user_details_index"
}
}
}
}
我正在尝试以上逻辑,但并非无法检索。有人可以帮忙吗。
由于字段 user_ids
的类型为 text
any 默认情况下没有为其指定分析器,它将使用 standard
分析器,不会将 8938,8936,8937
分解为术语 8938
、8936
和 8937
,因此 ID 无法匹配。
为了解决这个问题,我建议您将 ID 数组存储到 user_ids
字段而不是 csv。因此,在为您编制索引时 json 输入应如下所示:
{
...
"user_ids": [
8938,
8936,
8937
]
...
}
由于用户 ID 是整数值,因此应在映射中进行以下更改:
{
"user_ids": {
"type": "integer"
}
}
现在查询如下:
{
"query": {
"bool": {
"filter": [
[
{
"terms": {
"userIds": [
8936
]
}
}
]
],
"must_not": [
[
{
"exists": {
"field": "deleted_at"
}
}
]
]
}
},
"size": 10000
}
我在数据库中将 id 保存为逗号分隔并将其索引到 ElasticSearch。现在我需要检索 user_id 是否与值匹配。
例如,它在列 user_ids 的索引中像这样保存(数据库类型在 elasticsearch 中是 varchar(500),它是文本)
8938,8936,8937
$userId = 8936; // For example expecting to return that row
$whereCondition = [];
$whereCondition[] = [
"query_string" => [
"query"=> $userId,
"default_field" => "user_ids",
"default_operator" => "OR"
]
];
$searchParams = [
'query' => [
'bool' => [
'must' => [
$whereCondition
],
'must_not' => [
['exists' => ['field' => 'deleted_at']]
]
]
],
"size" => 10000
];
User::search($searchParams);
Json查询
{
"query": {
"bool": {
"must": [
[{
"query_string": {
"query": 8936,
"default_field": "user_ids",
"default_operator": "OR"
}
}]
],
"must_not": [
[{
"exists": {
"field": "deleted_at"
}
}]
]
}
},
"size": 10000
}
映射详细信息
{
"user_details_index": {
"aliases": {},
"mappings": {
"test_type": {
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"deleted_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"updated_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"user_ids": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"settings": {
"index": {
"creation_date": "1546404165500",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "krpph26NTv2ykt6xE05klQ",
"version": {
"created": "6020299"
},
"provided_name": "user_details_index"
}
}
}
}
我正在尝试以上逻辑,但并非无法检索。有人可以帮忙吗。
由于字段 user_ids
的类型为 text
any 默认情况下没有为其指定分析器,它将使用 standard
分析器,不会将 8938,8936,8937
分解为术语 8938
、8936
和 8937
,因此 ID 无法匹配。
为了解决这个问题,我建议您将 ID 数组存储到 user_ids
字段而不是 csv。因此,在为您编制索引时 json 输入应如下所示:
{
...
"user_ids": [
8938,
8936,
8937
]
...
}
由于用户 ID 是整数值,因此应在映射中进行以下更改:
{
"user_ids": {
"type": "integer"
}
}
现在查询如下:
{
"query": {
"bool": {
"filter": [
[
{
"terms": {
"userIds": [
8936
]
}
}
]
],
"must_not": [
[
{
"exists": {
"field": "deleted_at"
}
}
]
]
}
},
"size": 10000
}