部分搜索 return 零命中
Partial search return zero hits
我已经成功地使用 elasticsearch (V6.1.3) 进行了精确搜索。但是当我尝试部分或忽略大小写(例如:{"query": {"match": {"demodata": "Hello"}}}
或 {"query": {"match": {"demodata": "ell"}}}
)时,命中率为零。不知道为什么?我已经根据以下提示设置了我的分析器:
Partial search
from elasticsearch import Elasticsearch
es = Elasticsearch()
settings={
"mappings": {
"my-type": {
'properties': {"demodata": {
"type": "string",
"search_analyzer": "search_ngram",
"index_analyzer": "index_ngram"
}
}},
},
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 8
}
},
"analyzer": {
"index_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "ngram_filter", "lowercase" ]
},
"search_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}
es.indices.create(index="my-index", body=settings, ignore=400)
docs=[
{ "demodata": "hello" },
{ "demodata": "hi" },
{ "demodata": "bye" },
{ "demodata": "HelLo WoRld!" }
]
for doc in docs:
res = es.index(index="my-index", doc_type="my-type", body=doc)
res = es.search(index="my-index", body={"query": {"match": {"demodata": "Hello"}}})
print("Got %d Hits:" % res["hits"]["total"])
print (res)
更新了基于 Piotr Pradzynski 输入的代码,但它不起作用!!!
from elasticsearch import Elasticsearch
es = Elasticsearch()
if not es.indices.exists(index="my-index"):
customset={
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
}
}
}
}
}
es.indices.create(index="my-index", body=customset, ignore=400)
docs=[
{ "demodata": "hELLO" },
{ "demodata": "hi" },
{ "demodata": "bye" },
{ "demodata": "HeLlo WoRld!" },
{ "demodata": "xyz@abc.com" }
]
for doc in docs:
res = es.index(index="my-index", doc_type="my-type", body=doc)
es.indices.refresh(index="my-index")
res = es.search(index="my-index", body={"query": {"match": {"demodata":{"query":"ell","analyzer": "my_analyzer"}}}})
#print res
print("Got %d Hits:" % res["hits"]["total"])
print (res)
您需要的是 query_string 搜索。
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
{
"query":{
"query_string":{
"query":"demodata: *ell*"
}
}
}
我认为你应该使用 NGram Tokenizer instead of NGram Token Filter and add multi field,它将使用这个分词器。
类似的东西:
PUT my-index
{
"settings": {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 15,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"my-type": {
"properties": {
"demodata": {
"type": "text",
"fields": {
"ngram": {
"type": "text",
"analyzer": "ngram_analyzer",
"search_analyzer": "standard"
}
}
}
}
}
}
}
然后您必须在搜索中使用添加的多字段 demodata.ngram
:
res = es.search(index="my-index", body={"query": {"match": {"demodata.ngram": "Hello"}}})
我已经成功地使用 elasticsearch (V6.1.3) 进行了精确搜索。但是当我尝试部分或忽略大小写(例如:{"query": {"match": {"demodata": "Hello"}}}
或 {"query": {"match": {"demodata": "ell"}}}
)时,命中率为零。不知道为什么?我已经根据以下提示设置了我的分析器:
Partial search
from elasticsearch import Elasticsearch
es = Elasticsearch()
settings={
"mappings": {
"my-type": {
'properties': {"demodata": {
"type": "string",
"search_analyzer": "search_ngram",
"index_analyzer": "index_ngram"
}
}},
},
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 8
}
},
"analyzer": {
"index_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "ngram_filter", "lowercase" ]
},
"search_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}
es.indices.create(index="my-index", body=settings, ignore=400)
docs=[
{ "demodata": "hello" },
{ "demodata": "hi" },
{ "demodata": "bye" },
{ "demodata": "HelLo WoRld!" }
]
for doc in docs:
res = es.index(index="my-index", doc_type="my-type", body=doc)
res = es.search(index="my-index", body={"query": {"match": {"demodata": "Hello"}}})
print("Got %d Hits:" % res["hits"]["total"])
print (res)
更新了基于 Piotr Pradzynski 输入的代码,但它不起作用!!!
from elasticsearch import Elasticsearch
es = Elasticsearch()
if not es.indices.exists(index="my-index"):
customset={
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
}
}
}
}
}
es.indices.create(index="my-index", body=customset, ignore=400)
docs=[
{ "demodata": "hELLO" },
{ "demodata": "hi" },
{ "demodata": "bye" },
{ "demodata": "HeLlo WoRld!" },
{ "demodata": "xyz@abc.com" }
]
for doc in docs:
res = es.index(index="my-index", doc_type="my-type", body=doc)
es.indices.refresh(index="my-index")
res = es.search(index="my-index", body={"query": {"match": {"demodata":{"query":"ell","analyzer": "my_analyzer"}}}})
#print res
print("Got %d Hits:" % res["hits"]["total"])
print (res)
您需要的是 query_string 搜索。
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
{
"query":{
"query_string":{
"query":"demodata: *ell*"
}
}
}
我认为你应该使用 NGram Tokenizer instead of NGram Token Filter and add multi field,它将使用这个分词器。
类似的东西:
PUT my-index
{
"settings": {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 15,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"my-type": {
"properties": {
"demodata": {
"type": "text",
"fields": {
"ngram": {
"type": "text",
"analyzer": "ngram_analyzer",
"search_analyzer": "standard"
}
}
}
}
}
}
}
然后您必须在搜索中使用添加的多字段 demodata.ngram
:
res = es.search(index="my-index", body={"query": {"match": {"demodata.ngram": "Hello"}}})