为 py-elasticsearch-dsl 中的索引创建自定义分析器过滤器
Create custom analyzer filter for a index in py-elasticsearch-dsl
我正在为我的主人 py-elasticsearch-dsl
工作,我正在土耳其语标题语料库中创建标题文档索引,我需要为土耳其语实施自定义 lowercase
分析器: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lowercase-tokenfilter.html#analysis-lowercase-tokenfilter
我正在尝试这样做:
turkish = analysis.token_filter('turkish_lowercase', type="lowercase", language="turkish")
turkish_lowercase = analyzer('turkish_lowercase',
type = "custom",
tokenizer="standard",
filter=["turkish_lowercase"],
)
class Document(DocType):
# title = Text()
query = Percolator(
analyzer=turkish_lowercase,
filter=turkish
) # query is a percolator
class Meta:
index = 'titles' # index name
doc_type = '_doc'
def save(self, **kwargs):
return super(Document, self).save(**kwargs)
但是我收到了这个错误:
python percolator.py 1 ↵ 1736 17:37:54
PUT http://localhost:9200/title-index [status:400 request:0.004s]
Traceback (most recent call last):
File "percolator.py", line 55, in <module>
Document.init()
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/document.py", line 161, in init
cls._doc_type.init(index, using)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/document.py", line 85, in init
self.mapping.save(index or self.index, using=using or self.using)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/mapping.py", line 116, in save
return index.save()
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/index.py", line 219, in save
return self.create()
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/index.py", line 203, in create
self.connection.indices.create(index=self._name, body=self.to_dict(), **kwargs)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/client/indices.py", line 91, in create
params=params, body=body)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/transport.py", line 314, in perform_request
status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 163, in perform_request
self._raise_error(response.status, raw_data)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'illegal_argument_exception', 'Custom Analyzer [turkish_lowercase] failed to find filter under name [turkish_lowercase]')
那么,正确的做法是什么?
谢谢
要创建自定义过滤器,我们可以使用 token_filter
:
turkish = analysis.token_filter('turkish_lowercase', type="lowercase", language="turkish")
我们正在为 turkish
语言创建一个新的 lower_case
过滤器。现在,我们需要创建分析器:
turkish_lowercase = analyzer('turkish_lowercase',
type = "custom",
tokenizer="standard",
filter=[turkish],
)
我们直接把turkish
过滤器放在filter
参数里,仅此而已;我们可以使用 get_definition
函数查看结果字典 filter
as analyzer
.
最后,我们在 Document
:
中应用了该分析器
class Document(DocType):
title = Text(
analyzer=turkish_lowercase,
# filter=turkish
)
query = Percolator(
) # query is a percolator
class Meta:
index = 'titles' # index name
doc_type = '_doc'
def save(self, **kwargs):
return super(Document, self).save(**kwargs)
我们将得到下一个结果:
{
"titles":{
"aliases":{
},
"mappings":{
"_doc":{
"properties":{
"query":{
"type":"percolator"
},
"title":{
"type":"text",
"analyzer":"turkish_lowercase"
}
}
}
},
"settings":{
"index":{
"number_of_shards":"5",
"provided_name":"titles",
"analysis":{
"filter":{
"turkish_lowercase":{
"type":"lowercase",
"language":"turkish"
}
},
"analyzer":{
"turkish_lowercase":{
"filter":[
"turkish_lowercase"
],
"type":"custom",
"tokenizer":"standard"
}
}
},
"number_of_replicas":"1",
}
}
}
}
我正在为我的主人 py-elasticsearch-dsl
工作,我正在土耳其语标题语料库中创建标题文档索引,我需要为土耳其语实施自定义 lowercase
分析器: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lowercase-tokenfilter.html#analysis-lowercase-tokenfilter
我正在尝试这样做:
turkish = analysis.token_filter('turkish_lowercase', type="lowercase", language="turkish")
turkish_lowercase = analyzer('turkish_lowercase',
type = "custom",
tokenizer="standard",
filter=["turkish_lowercase"],
)
class Document(DocType):
# title = Text()
query = Percolator(
analyzer=turkish_lowercase,
filter=turkish
) # query is a percolator
class Meta:
index = 'titles' # index name
doc_type = '_doc'
def save(self, **kwargs):
return super(Document, self).save(**kwargs)
但是我收到了这个错误:
python percolator.py 1 ↵ 1736 17:37:54
PUT http://localhost:9200/title-index [status:400 request:0.004s]
Traceback (most recent call last):
File "percolator.py", line 55, in <module>
Document.init()
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/document.py", line 161, in init
cls._doc_type.init(index, using)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/document.py", line 85, in init
self.mapping.save(index or self.index, using=using or self.using)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/mapping.py", line 116, in save
return index.save()
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/index.py", line 219, in save
return self.create()
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch_dsl/index.py", line 203, in create
self.connection.indices.create(index=self._name, body=self.to_dict(), **kwargs)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/client/indices.py", line 91, in create
params=params, body=body)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/transport.py", line 314, in perform_request
status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 163, in perform_request
self._raise_error(response.status, raw_data)
File "/home/salahaddin/Proyectos/Works/seminer/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'illegal_argument_exception', 'Custom Analyzer [turkish_lowercase] failed to find filter under name [turkish_lowercase]')
那么,正确的做法是什么?
谢谢
要创建自定义过滤器,我们可以使用 token_filter
:
turkish = analysis.token_filter('turkish_lowercase', type="lowercase", language="turkish")
我们正在为 turkish
语言创建一个新的 lower_case
过滤器。现在,我们需要创建分析器:
turkish_lowercase = analyzer('turkish_lowercase',
type = "custom",
tokenizer="standard",
filter=[turkish],
)
我们直接把turkish
过滤器放在filter
参数里,仅此而已;我们可以使用 get_definition
函数查看结果字典 filter
as analyzer
.
最后,我们在 Document
:
class Document(DocType):
title = Text(
analyzer=turkish_lowercase,
# filter=turkish
)
query = Percolator(
) # query is a percolator
class Meta:
index = 'titles' # index name
doc_type = '_doc'
def save(self, **kwargs):
return super(Document, self).save(**kwargs)
我们将得到下一个结果:
{
"titles":{
"aliases":{
},
"mappings":{
"_doc":{
"properties":{
"query":{
"type":"percolator"
},
"title":{
"type":"text",
"analyzer":"turkish_lowercase"
}
}
}
},
"settings":{
"index":{
"number_of_shards":"5",
"provided_name":"titles",
"analysis":{
"filter":{
"turkish_lowercase":{
"type":"lowercase",
"language":"turkish"
}
},
"analyzer":{
"turkish_lowercase":{
"filter":[
"turkish_lowercase"
],
"type":"custom",
"tokenizer":"standard"
}
}
},
"number_of_replicas":"1",
}
}
}
}