使用 elasticsearch，如何为包含数组的文档创建索引，并在将来追加到该数组

Question

在我的示例代码中，我使用的是 php 客户端库，但熟悉 elasticsearch 的任何人都应该理解它。

我正在使用 elasticsearch 创建一个索引，其中每个文档都包含一组 nGram 索引作者。最初，文档只有一个作者，但随着时间的推移，更多的作者将被添加到数组中。理想情况下，可以按作者姓名执行搜索，如果数组中的任何作者匹配，就会找到该文档。

我一直在尝试使用文档 here for appending to the array and here 来使用数组类型 - 但我没有成功地使用它。

首先，我想为文档创建一个索引，其中包含标题、作者数组和评论数组。

$client = new Client();
$params = [
    'index' => 'document',
    'body' => [
        'settings' => [
            // Simple settings for now, single shard
            'number_of_shards' => 1,
            'number_of_replicas' => 0,
            'analysis' => [
                'filter' => [
                    'shingle' => [
                        'type' => 'shingle'
                    ]
                ],
                'analyzer' => [
                    'my_ngram_analyzer' => [
                        'tokenizer' => 'my_ngram_tokenizer',
                        'filter' => 'lowercase',
                    ]
                ],
                // Allow searching for partial names with nGram
                'tokenizer' => [
                    'my_ngram_tokenizer' => [
                        'type' => 'nGram',
                        'min_gram' => 1,
                        'max_gram' => 15,
                        'token_chars' => ['letter', 'digit']
                    ]
                ]
            ]
        ],
        'mappings' => [
            '_default_' => [
                'properties' => [
                    'document_id' => [
                        'type' => 'string',
                        'index' => 'not_analyzed',
                    ],
                    // The name, email, or other info related to the person
                    'title' => [
                        'type' => 'string',
                        'analyzer' => 'my_ngram_analyzer',
                        'term_vector' => 'yes',
                        'copy_to' => 'combined'
                    ],
                    'authors' => [
                        'type' => 'list',
                        'analyzer' => 'my_ngram_analyzer',
                        'term_vector' => 'yes',
                        'copy_to' => 'combined'
                    ],
                    'comments' => [
                        'type' => 'list',
                        'analyzer' => 'my_ngram_analyzer',
                        'term_vector' => 'yes',
                        'copy_to' => 'combined'
                    ],
                ]
            ],
        ]
    ]
];
// Create index `person` with ngram indexing
$client->indices()->create($params);

一开始，由于这个错误，我什至无法创建索引：

{"error":"MapperParsingException[mapping [_default_]]; nested: MapperParsingException[No handler for type [list] declared on field [authors]]; ","status":400}

如果成功的话，我会计划创建一个索引，从作者和标题的空数组开始，像这样：

    $client = new Client();
    $params = array();
    $params['body']  = array('document_id' => 'id_here', 'title' => 'my_title', 'authors' => [], 'comments' => []);
    $params['index'] = 'document';
    $params['type']  = 'example_type';
    $params['id'] = 'id_here';
    $ret = $client->index($params);
    return $ret;

如果我有所需的索引来添加此信息结构，这似乎应该可行，但我担心的是使用 update 将某些内容附加到数组。例如，

    $client = new Client();
    $params = array();
    //$params['body']  = array('person_id' => $person_id, 'emails' => [$email]);
    $params['index'] = 'document';
    $params['type']  = 'example_type';
    $params['id'] = 'id_here';
    $params['script'] = 'NO IDEA WHAT THIS SCRIPT SHOULD BE TO APPEND TO THE ARRAY';
    $ret = $client->update($params);
    return $ret;
}

我不确定我将如何实际将一个东西附加到数组并确保它被索引。

最后，让我感到困惑的另一件事是我如何根据数组中的任何作者进行搜索。理想情况下，我可以这样做：

但我不确定它是否会 100% 有效。也许我不了解有关 elasticsearch 的一些基本知识。我是全新的，所以任何能让我达到这些小细节不会让我失望的资源都将不胜感激。

此外，任何有关如何使用 elasticsearch 解决这些问题的直接建议将不胜感激。

抱歉，文字太长，重述一下，我正在寻找有关如何

的建议

创建支持对数组的所有元素进行 nGram 分析的索引
更新该索引以附加到数组
正在搜索 now-updated 索引。

感谢您的帮助

编辑：感谢@astax，我现在能够创建索引并将其作为字符串附加到值。但是，这有两个问题：

数组存储为字符串值，因此像

$params['script'] = 'ctx._source.authors += [\'hello\']';

实际上是将字符串附加到 [] 而不是包含值的数组。

输入的值似乎没有经过 ngram 分析，因此搜索如下：

$client = new Client(); $searchParams['index'] = 'document'; $searchParams['type'] = 'example_type'; $searchParams['body']['query']['match']['_all'] = 'hello'; $queryResponse = $client->search($searchParams); print_r($queryResponse); // 成功

将找到新值，但搜索如下：

$client = new Client();
$searchParams['index'] = 'document';
$searchParams['type']  = 'example_type';
$searchParams['body']['query']['match']['_all'] = 'hel';
$queryResponse = $client->search($searchParams);
print_r($queryResponse); // NO RESULTS

没有

Answer 1

elasticsearch 中没有类型"list"。但是您可以使用 "string" 字段类型并存储值数组。

                ....
                'comments' => [
                    'type' => 'string',
                    'analyzer' => 'my_ngram_analyzer',
                    'term_vector' => 'yes',
                    'copy_to' => 'combined'
                ],
                ....

并以这种方式索引文档：

....
$params['body']  = array(
   'document_id' => 'id_here',
   'title' => 'my_title',
   'authors' => [],
   'comments' => ['comment1', 'comment2']);
....

至于将元素附加到数组的脚本，此答案可能对您有所帮助 - Elasticsearch upserting and appending to array

但是，您真的需要更新文档吗？重新索引它可能更容易，因为这正是 Elasticsearch 内部所做的。它读取“_source”属性，进行必要的修改并重新编制索引。顺便说一句，这意味着必须启用“_source”并且文档的所有属性都应该包含在其中。

您也可以考虑将评论和作者（据我了解，这些是评论的作者，而不是文档作者）存储为 ES 中的子文档并使用 "has_child" 过滤器。

我真的不能给你具体的解决方案，但强烈建议安装 ElasticSearch 的 Marvel 插件并使用它的 "sense" 工具来逐步检查你的整个过程是如何工作的。

因此，请按照 http://www.elastic.co/guide/en/elasticsearch/reference/1.4/indices-analyze.html 中所述，通过运行测试检查您的分词器是否正确配置。

然后通过运行 GET /document/example_type/some_existing_id 检索文档来检查您的更新脚本是否正在执行您期望的操作作者和评论应该是数组，而不是字符串。

最后执行搜索：

GET /document/_search { 'query' : { 'match': { '_all': 'hel' } } }

如果您自己构建查询而不是从用户那里获取查询，您可以使用 query_string 和占位符：

GET /document/_search { 'query' : { 'query_string': { 'fields': '_all', 'query': 'hel*' } } }

使用 elasticsearch，如何为包含数组的文档创建索引，并在将来追加到该数组

Using elasticsearch, how to create an index for a document that contains an array, and append to that array in the future

php

arrays

elasticsearch