如何使用path_hierarchy在elasticsearch中获取从root到level 1后代的对象?
How to get objects from root to level 1 descendant in elastic search by using path_hierarchy?
我正在使用 path_hierarchy 在 Elastic 搜索中维护树状结构
PUT file-path-test
{
"settings": {
"analysis": {
"analyzer": {
"custom_path_tree": {
"tokenizer": "custom_hierarchy"
},
"custom_path_tree_reversed": {
"tokenizer": "custom_hierarchy_reversed"
}
},
"tokenizer": {
"custom_hierarchy": {
"type": "path_hierarchy",
"delimiter": "/"
},
"custom_hierarchy_reversed": {
"type": "path_hierarchy",
"delimiter": "/",
"reverse": "true"
}
}
}
},
"mappings": {
"properties": {
"file_path": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"analyzer": "custom_path_tree"
},
"tree_reversed": {
"type": "text",
"analyzer": "custom_path_tree_reversed"
}
}
}
}
}
}
下面是插入脚本
POST file-path-test/_doc/6
{
"file_path": "/folder1"
}
POST file-path-test/_doc/7
{
"file_path": "/folder1/folder2"
}
POST file-path-test/_doc/8
{
"file_path": "/folder1/folder2/folder3"
}
POST file-path-test/_doc/9
{
"file_path": "/folder1/folder2/folder3/folder4"
}
所以我希望查询仅显示从根级别到 1 级别后代对象的对象
例如,如果我得到路径 '/folder1/folder2' 那么查询应该 return
/folder1
/folder1/folder2
/folder1/folder2/folder3
不应包含以下值
/folder1/folder2/folder3/folder4
下面的查询是 returning 整棵树的值,即从根节点到叶节点
GET file-path-test/_search
{
"query": {
"term": {
"file_path.tree": "/folder1/folder2"
}
}
}
结果:
/folder1
/folder1/folder2
/folder1/folder2/folder3
/folder1/folder2/folder3/folder4
查询时文件路径将需要转换为可比较的子路径列表 — 与您的 custom_path_tree
分析器生成的相同。
繁重的工作可能发生在 script query 中,但为了访问此类脚本中的 file_path.tree
成员,需要进行轻微但重要的映射修改:
PUT file-path-test
{
"settings": {
"analysis": {
...
}
},
"mappings": {
"properties": {
"file_path": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"fielddata": true, <---
"analyzer": "custom_path_tree"
},
...
}
}
}
}
}
完成后,使用以下脚本:
GET file-path-test/_search
{
"query": {
"script": {
"script": {
"source": """
def doc_tree = doc['file_path.tree'];
def folder_path_query = params.folder_path_query;
def query_path_tree = [];
def last_appended_folder = '';
for (folder in /\//.split(folder_path_query)) {
def folder_str = folder.toString();
if (folder_str.length() > 0) {
last_appended_folder += '/'+ folder_str;
query_path_tree.add(last_appended_folder);
}
}
def doc_in_query = doc_tree.stream().allMatch(folder -> query_path_tree.contains(folder));
def query_in_doc = query_path_tree.stream().allMatch(folder -> doc_tree.contains(folder));
def doc_tree_size = doc_tree.size();
def query_path_tree_size = query_path_tree.size();
// +1 for the closest descendant
if (doc_tree_size <= query_path_tree_size + 1) {
return doc_in_query || query_in_doc
}
return false
""",
"params": {
"folder_path_query": "/folder1/folder2"
}
}
}
}
}
它所做的只是重建参数化的目录树folder_path_query
,以便可以比较两棵树(数组)的值流。
我正在使用 path_hierarchy 在 Elastic 搜索中维护树状结构
PUT file-path-test
{
"settings": {
"analysis": {
"analyzer": {
"custom_path_tree": {
"tokenizer": "custom_hierarchy"
},
"custom_path_tree_reversed": {
"tokenizer": "custom_hierarchy_reversed"
}
},
"tokenizer": {
"custom_hierarchy": {
"type": "path_hierarchy",
"delimiter": "/"
},
"custom_hierarchy_reversed": {
"type": "path_hierarchy",
"delimiter": "/",
"reverse": "true"
}
}
}
},
"mappings": {
"properties": {
"file_path": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"analyzer": "custom_path_tree"
},
"tree_reversed": {
"type": "text",
"analyzer": "custom_path_tree_reversed"
}
}
}
}
}
}
下面是插入脚本
POST file-path-test/_doc/6
{
"file_path": "/folder1"
}
POST file-path-test/_doc/7
{
"file_path": "/folder1/folder2"
}
POST file-path-test/_doc/8
{
"file_path": "/folder1/folder2/folder3"
}
POST file-path-test/_doc/9
{
"file_path": "/folder1/folder2/folder3/folder4"
}
所以我希望查询仅显示从根级别到 1 级别后代对象的对象 例如,如果我得到路径 '/folder1/folder2' 那么查询应该 return
/folder1
/folder1/folder2
/folder1/folder2/folder3
不应包含以下值
/folder1/folder2/folder3/folder4
下面的查询是 returning 整棵树的值,即从根节点到叶节点
GET file-path-test/_search
{
"query": {
"term": {
"file_path.tree": "/folder1/folder2"
}
}
}
结果:
/folder1
/folder1/folder2
/folder1/folder2/folder3
/folder1/folder2/folder3/folder4
查询时文件路径将需要转换为可比较的子路径列表 — 与您的 custom_path_tree
分析器生成的相同。
繁重的工作可能发生在 script query 中,但为了访问此类脚本中的 file_path.tree
成员,需要进行轻微但重要的映射修改:
PUT file-path-test
{
"settings": {
"analysis": {
...
}
},
"mappings": {
"properties": {
"file_path": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"fielddata": true, <---
"analyzer": "custom_path_tree"
},
...
}
}
}
}
}
完成后,使用以下脚本:
GET file-path-test/_search
{
"query": {
"script": {
"script": {
"source": """
def doc_tree = doc['file_path.tree'];
def folder_path_query = params.folder_path_query;
def query_path_tree = [];
def last_appended_folder = '';
for (folder in /\//.split(folder_path_query)) {
def folder_str = folder.toString();
if (folder_str.length() > 0) {
last_appended_folder += '/'+ folder_str;
query_path_tree.add(last_appended_folder);
}
}
def doc_in_query = doc_tree.stream().allMatch(folder -> query_path_tree.contains(folder));
def query_in_doc = query_path_tree.stream().allMatch(folder -> doc_tree.contains(folder));
def doc_tree_size = doc_tree.size();
def query_path_tree_size = query_path_tree.size();
// +1 for the closest descendant
if (doc_tree_size <= query_path_tree_size + 1) {
return doc_in_query || query_in_doc
}
return false
""",
"params": {
"folder_path_query": "/folder1/folder2"
}
}
}
}
}
它所做的只是重建参数化的目录树folder_path_query
,以便可以比较两棵树(数组)的值流。