通过 elasticsearch.index、body 结构将新文档添加到带有映射的 elasticsearch
Adding new documents to elasticsearch with mapping via elasticsearch.index, body structure
我正在使用 flask 构建 blog-like 应用程序(基于 Miguel Grinberg Megatutorial),我正在尝试设置支持自动完成功能的 ES 索引。我正在努力正确设置索引。
我从(有效的)简单索引机制开始:
from flask import current_app
def add_to_index(index, model):
if not current_app.elasticsearch:
return
payload = {}
for field in model.__searchable__:
payload[field] = getattr(model, field)
current_app.elasticsearch.index(index=index, id=model.id, body=payload)
在玩了 Google 之后,我发现我的 body 可能看起来像那样(可能使用更少的分析器,但我完全按照我在某处找到它的方式来应对,作者所在的地方声称有效):
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
field: {
"properties": {
"name": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}
}
}
}
}
我发现我可以将原始机制修改为:
for field in model.__searchable__:
temp = getattr(model, field)
fields[field] = {"properties": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}}
payload = {
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": fields
}
但这就是我迷路的地方。我应该将实际内容 (temp=getattr(model, field)) 放在文档中的什么位置,这样整个过程才能正常工作?我找不到任何示例或文档的相关部分来涵盖使用稍微复杂的映射更新索引等等,这甚至是 correct/doable?我看到的每个指南都涵盖了批量索引,但不知何故我无法建立联系。
我觉得你有点糊涂让我试着解释一下。您想要的是在弹性文件中添加一个文件:
current_app.elasticsearch.index(index=index, id=model.id,
body=payload)
使用 elasticsearch-py 库中定义的 index() 方法
检查这里的例子:
https://elasticsearch-py.readthedocs.io/en/master/index.html#example-usage
body 必须是您的文档一个简单的字典,如文档中的示例所示。
你设置的是索引设置不同。拿数据库做类比,你在文档里面设置了一个table的schema。
如果您想设置给定的设置,您需要使用 put_settings 来设置设置,如此处定义:
https://elasticsearch-py.readthedocs.io/en/master/api.html?highlight=settings#elasticsearch.client.ClusterClient.put_settings
希望对你有所帮助
我正在使用 flask 构建 blog-like 应用程序(基于 Miguel Grinberg Megatutorial),我正在尝试设置支持自动完成功能的 ES 索引。我正在努力正确设置索引。
我从(有效的)简单索引机制开始:
from flask import current_app
def add_to_index(index, model):
if not current_app.elasticsearch:
return
payload = {}
for field in model.__searchable__:
payload[field] = getattr(model, field)
current_app.elasticsearch.index(index=index, id=model.id, body=payload)
在玩了 Google 之后,我发现我的 body 可能看起来像那样(可能使用更少的分析器,但我完全按照我在某处找到它的方式来应对,作者所在的地方声称有效):
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
field: {
"properties": {
"name": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}
}
}
}
}
我发现我可以将原始机制修改为:
for field in model.__searchable__:
temp = getattr(model, field)
fields[field] = {"properties": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
}}
payload = {
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": fields
}
但这就是我迷路的地方。我应该将实际内容 (temp=getattr(model, field)) 放在文档中的什么位置,这样整个过程才能正常工作?我找不到任何示例或文档的相关部分来涵盖使用稍微复杂的映射更新索引等等,这甚至是 correct/doable?我看到的每个指南都涵盖了批量索引,但不知何故我无法建立联系。
我觉得你有点糊涂让我试着解释一下。您想要的是在弹性文件中添加一个文件:
current_app.elasticsearch.index(index=index, id=model.id, body=payload)
使用 elasticsearch-py 库中定义的 index() 方法 检查这里的例子: https://elasticsearch-py.readthedocs.io/en/master/index.html#example-usage body 必须是您的文档一个简单的字典,如文档中的示例所示。
你设置的是索引设置不同。拿数据库做类比,你在文档里面设置了一个table的schema。
如果您想设置给定的设置,您需要使用 put_settings 来设置设置,如此处定义: https://elasticsearch-py.readthedocs.io/en/master/api.html?highlight=settings#elasticsearch.client.ClusterClient.put_settings
希望对你有所帮助