如何使用 DSL 查询在 elasticsearch 中匹配精确的文档数据?
How to match exact document data in elasticsearch using DSL query?
我的分词器
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
我正在尝试基于此字段搜索值,但这里的问题是无论何时,我想基于令牌进行搜索,假设如果我使用 s 令牌进行搜索,那么我应该得到匹配或开始于 s 的项目,现在如果我用 sp 搜索我想从 sp 开始获取项目丢弃其他东西,我只想获取以 sp 开头的值并丢弃所有,我没有得到我的查询错误或过滤器我用错了有人可以请帮我解决这个问题
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "PRODUCT",
"fields": [
"item",
"data1"
]
}
},
{
"multi_match": {
"query": "SUB_FAMILY",
"fields": [
"item",
"data1"
]
}
},
{
"match": {
"values": "SP"
}
}
]
}
}
}
这个查询的输出是
"hits": [
{
"_index": "logs_datas",
"_type": "_doc",
"_id": "H1PfEnkBQXpKNrJSp8bV",
"_score": 9.418445,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
"path": "/home/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.578Z",
"host": "ewiglp71",
"item_pk": "SPRINHO2H",
"attribute_name": "SUB_FAMILY"
}
},
{
"_index": "logs_datas",
"_type": "_doc",
"_id": "y1PfEnkBQXpKNrJSp8XQ",
"_score": 5.3059187,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SCMLPLWVI",
"path": "/home/niteshb/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.577Z",
"host": "ewiglp71",
"item_pk": "SCMLPLWVI",
"attribute_name": "SUB_FAMILY"
}
},
{
"_index": "logs_datas",
"_type": "_doc",
"_id": "zFPfEnkBQXpKNrJSp8XQ",
"_score": 5.3059187,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SSVRKEN2Z",
"path": "/home/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.579Z",
"host": "ewiglp71",
"item_pk": "SSVRKEN2Z",
"attribute_name": "SUB_FAMILY"
}
}
}
]
}
}
由于 min_gram
为 1,因此为 SCMLPLWVI
生成的令牌将为
{
"tokens": [
{
"token": "S",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "SC",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "SCM",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "SCML",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 3
},
{
"token": "SCMLP",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 4
},
{
"token": "SCMLPL",
"start_offset": 0,
"end_offset": 6,
"type": "word",
"position": 5
},
{
"token": "SCMLPLW",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 6
},
{
"token": "SCMLPLWV",
"start_offset": 0,
"end_offset": 8,
"type": "word",
"position": 7
},
{
"token": "SCMLPLWVI",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 8
}
]
}
如果您想获取以 sp
开头的值,那么您需要将分词器修改为
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2, // note this
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
更新 1:
您可以使用 match_bool_prefix 搜索以 s
或 sp
开头的单词
添加一个工作示例
索引映射:
{
"mappings": {
"properties": {
"item_pk": {
"type": "text"
}
}
}
}
搜索查询 1:
{
"query": {
"match_bool_prefix" : {
"item_pk" : "s"
}
}
}
搜索结果将是
"hits": [
{
"_index": "67281810",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
"path": "/home/niteshb/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.578Z",
"host": "ewiglp71",
"item_pk": "SPRINHO2H",
"attribute_name": "SUB_FAMILY"
}
},
{
"_index": "67281810",
"_type": "_doc",
"_id": "i7quE3kB6jKCA-nFYii6",
"_score": 1.0,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SCMLPLWVI",
"path": "/home/niteshb/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.577Z",
"host": "ewiglp71",
"item_pk": "SCMLPLWVI",
"attribute_name": "SUB_FAMILY"
}
},
{
"_index": "67281810",
"_type": "_doc",
"_id": "jLquE3kB6jKCA-nFgiju",
"_score": 1.0,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SSVRKEN2Z",
"path": "/home/niteshb/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.579Z",
"host": "ewiglp71",
"item_pk": "SSVRKEN2Z",
"attribute_name": "SUB_FAMILY"
}
}
]
搜索查询 2:
{
"query": {
"match_bool_prefix" : {
"item_pk" : "sp"
}
}
}
搜索结果:
"hits": [
{
"_index": "67281810",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
"path": "/home/niteshb/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.578Z",
"host": "ewiglp71",
"item_pk": "SPRINHO2H",
"attribute_name": "SUB_FAMILY"
}
}
]
更新二:
试试这个查询
{
"query": {
"bool": {
"must": [
{
"match": {
"hierarchy_name": "PRODUCT"
}
},
{
"match": {
"attribute_name": "SUB_FAMILY"
}
},
{
"match_bool_prefix": {
"item_pk": "sp"
}
}
]
}
}
}
我的分词器
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
我正在尝试基于此字段搜索值,但这里的问题是无论何时,我想基于令牌进行搜索,假设如果我使用 s 令牌进行搜索,那么我应该得到匹配或开始于 s 的项目,现在如果我用 sp 搜索我想从 sp 开始获取项目丢弃其他东西,我只想获取以 sp 开头的值并丢弃所有,我没有得到我的查询错误或过滤器我用错了有人可以请帮我解决这个问题
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "PRODUCT",
"fields": [
"item",
"data1"
]
}
},
{
"multi_match": {
"query": "SUB_FAMILY",
"fields": [
"item",
"data1"
]
}
},
{
"match": {
"values": "SP"
}
}
]
}
}
}
这个查询的输出是
"hits": [
{
"_index": "logs_datas",
"_type": "_doc",
"_id": "H1PfEnkBQXpKNrJSp8bV",
"_score": 9.418445,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
"path": "/home/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.578Z",
"host": "ewiglp71",
"item_pk": "SPRINHO2H",
"attribute_name": "SUB_FAMILY"
}
},
{
"_index": "logs_datas",
"_type": "_doc",
"_id": "y1PfEnkBQXpKNrJSp8XQ",
"_score": 5.3059187,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SCMLPLWVI",
"path": "/home/niteshb/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.577Z",
"host": "ewiglp71",
"item_pk": "SCMLPLWVI",
"attribute_name": "SUB_FAMILY"
}
},
{
"_index": "logs_datas",
"_type": "_doc",
"_id": "zFPfEnkBQXpKNrJSp8XQ",
"_score": 5.3059187,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SSVRKEN2Z",
"path": "/home/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.579Z",
"host": "ewiglp71",
"item_pk": "SSVRKEN2Z",
"attribute_name": "SUB_FAMILY"
}
}
}
]
}
}
由于 min_gram
为 1,因此为 SCMLPLWVI
生成的令牌将为
{
"tokens": [
{
"token": "S",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "SC",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "SCM",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "SCML",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 3
},
{
"token": "SCMLP",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 4
},
{
"token": "SCMLPL",
"start_offset": 0,
"end_offset": 6,
"type": "word",
"position": 5
},
{
"token": "SCMLPLW",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 6
},
{
"token": "SCMLPLWV",
"start_offset": 0,
"end_offset": 8,
"type": "word",
"position": 7
},
{
"token": "SCMLPLWVI",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 8
}
]
}
如果您想获取以 sp
开头的值,那么您需要将分词器修改为
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2, // note this
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
更新 1:
您可以使用 match_bool_prefix 搜索以 s
或 sp
添加一个工作示例
索引映射:
{
"mappings": {
"properties": {
"item_pk": {
"type": "text"
}
}
}
}
搜索查询 1:
{
"query": {
"match_bool_prefix" : {
"item_pk" : "s"
}
}
}
搜索结果将是
"hits": [
{
"_index": "67281810",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
"path": "/home/niteshb/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.578Z",
"host": "ewiglp71",
"item_pk": "SPRINHO2H",
"attribute_name": "SUB_FAMILY"
}
},
{
"_index": "67281810",
"_type": "_doc",
"_id": "i7quE3kB6jKCA-nFYii6",
"_score": 1.0,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SCMLPLWVI",
"path": "/home/niteshb/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.577Z",
"host": "ewiglp71",
"item_pk": "SCMLPLWVI",
"attribute_name": "SUB_FAMILY"
}
},
{
"_index": "67281810",
"_type": "_doc",
"_id": "jLquE3kB6jKCA-nFgiju",
"_score": 1.0,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SSVRKEN2Z",
"path": "/home/niteshb/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.579Z",
"host": "ewiglp71",
"item_pk": "SSVRKEN2Z",
"attribute_name": "SUB_FAMILY"
}
}
]
搜索查询 2:
{
"query": {
"match_bool_prefix" : {
"item_pk" : "sp"
}
}
}
搜索结果:
"hits": [
{
"_index": "67281810",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"message": "PRODUCT,SUB_FAMILY,SPRINHO2H",
"path": "/home/niteshb/elasticsearchDatas.csv",
"hierarchy_name": "PRODUCT",
"@version": "1",
"@timestamp": "2021-04-27T10:28:37.578Z",
"host": "ewiglp71",
"item_pk": "SPRINHO2H",
"attribute_name": "SUB_FAMILY"
}
}
]
更新二:
试试这个查询
{
"query": {
"bool": {
"must": [
{
"match": {
"hierarchy_name": "PRODUCT"
}
},
{
"match": {
"attribute_name": "SUB_FAMILY"
}
},
{
"match_bool_prefix": {
"item_pk": "sp"
}
}
]
}
}
}