如何使用path_hierarchy在elasticsearch中获取从root到level 1后代的对象?

How to get objects from root to level 1 descendant in elastic search by using path_hierarchy?

我正在使用 path_hierarchy 在 Elastic 搜索中维护树状结构

PUT file-path-test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_path_tree": {
          "tokenizer": "custom_hierarchy"
        },
        "custom_path_tree_reversed": {
          "tokenizer": "custom_hierarchy_reversed"
        }
      },
      "tokenizer": {
        "custom_hierarchy": {
          "type": "path_hierarchy",
          "delimiter": "/"
        },
        "custom_hierarchy_reversed": {
          "type": "path_hierarchy",
          "delimiter": "/",
          "reverse": "true"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "file_path": {
        "type": "text",
        "fields": {
          "tree": {
            "type": "text",
            "analyzer": "custom_path_tree"
          },
          "tree_reversed": {
            "type": "text",
            "analyzer": "custom_path_tree_reversed"
          }
        }
      }
    }
  }
}

下面是插入脚本

POST file-path-test/_doc/6
{
  "file_path": "/folder1"
}

POST file-path-test/_doc/7
{
  "file_path": "/folder1/folder2"
}

POST file-path-test/_doc/8
{
  "file_path": "/folder1/folder2/folder3"
}

POST file-path-test/_doc/9
{
  "file_path": "/folder1/folder2/folder3/folder4"
}

所以我希望查询仅显示从根级别到 1 级别后代对象的对象 例如,如果我得到路径 '/folder1/folder2' 那么查询应该 return

/folder1
/folder1/folder2
/folder1/folder2/folder3

不应包含以下值

/folder1/folder2/folder3/folder4

下面的查询是 returning 整棵树的值,即从根节点到叶节点

GET file-path-test/_search
{
  "query": {
    "term": {
       "file_path.tree": "/folder1/folder2"
    }
  }
  }

结果:

/folder1
/folder1/folder2
/folder1/folder2/folder3
/folder1/folder2/folder3/folder4

查询时文件路径将需要转换为可比较的子路径列表 — 与您的 custom_path_tree 分析器生成的相同。

繁重的工作可能发生在 script query 中,但为了访问此类脚本中的 file_path.tree 成员,需要进行轻微但重要的映射修改:

PUT file-path-test
{
  "settings": {
    "analysis": {
      ...
    }
  },
  "mappings": {
    "properties": {
      "file_path": {
        "type": "text",
        "fields": {
          "tree": {
            "type": "text",
            "fielddata": true,                  <---
            "analyzer": "custom_path_tree"
          },
          ...
        }
      }
    }
  }
}

完成后,使用以下脚本:

GET file-path-test/_search
{
  "query": {
    "script": {
      "script": {
        "source": """
          def doc_tree = doc['file_path.tree'];    
          
          def folder_path_query = params.folder_path_query;
          def query_path_tree = [];
          def last_appended_folder = '';
          for (folder in /\//.split(folder_path_query)) {
            def folder_str = folder.toString();
            if (folder_str.length() > 0) {
              last_appended_folder += '/'+ folder_str;
              query_path_tree.add(last_appended_folder);
            }
          }

          def doc_in_query = doc_tree.stream().allMatch(folder -> query_path_tree.contains(folder));
          def query_in_doc = query_path_tree.stream().allMatch(folder -> doc_tree.contains(folder));
          
          def doc_tree_size = doc_tree.size();
          def query_path_tree_size = query_path_tree.size();
          
          // +1 for the closest descendant
          if (doc_tree_size <= query_path_tree_size + 1) {
            return doc_in_query || query_in_doc
          }
          
          return false
        """,
        "params": {
          "folder_path_query": "/folder1/folder2"
        }
      }
    }
  }
}

它所做的只是重建参数化的目录树folder_path_query,以便可以比较两棵树(数组)的值流。