Elasticsearch：如何存储术语向量

Question

我正在从事一个项目，我大量使用 Elasticsearch 并利用 moreLikeThis 查询来实现一些功能。 MLT 查询的官方文档说明如下：

In order to speed up analysis, it could help to store term vectors at index time, but at the expense of disk usage.

在**工作原理*部分。现在的想法是调整映射，以便存储预先计算的术语向量。问题是从文档中似乎不清楚应该如何完成。一方面，在 MLT 文档中，他们提供了如下所示的示例映射：

curl -s -XPUT 'http://localhost:9200/imdb/' -d '{
  "mappings": {
    "movies": {
      "properties": {
        "title": {
          "type": "string",
          "term_vector": "yes"
         },
         "description": {
          "type": "string"
        },
        "tags": {
          "type": "string",
          "fields" : {
            "raw": {
              "type" : "string",
              "index" : "not_analyzed",
              "term_vector" : "yes"
            }
          }
        }
      }
    }
  }
}

另一方面，在 Term Vectors documentation 中，他们在 示例 1 部分中提供了一个映射，如下所示

curl -s -XPUT 'http://localhost:9200/twitter/' -d '{
  "mappings": {
    "tweet": {
      "properties": {
        "text": {
          "type": "string",
          "term_vector": "with_positions_offsets_payloads",
          "store" : true,
          "index_analyzer" : "fulltext_analyzer"
         },
         "fullname": {
          "type": "string",
          "term_vector": "with_positions_offsets_payloads",
          "index_analyzer" : "fulltext_analyzer"
        }
      }
    }
    ....

这应该create an index that stores term vectors, payloads etc.

现在的问题是：应该使用哪个映射？是文档中的缺陷还是我遗漏了什么？

Answer 1

你是对的，当前版本的文档中似乎没有明确提及，但是在即将发布的版本中2.0 documents会有更详细的解释。

Term vectors contain information about the terms produced by the analysis process, including:

a list of terms.

the position (or order) of each term.

the start and end character offsets mapping the term to its origin in the original string.

These term vectors can be stored so that they can be retrieved for a particular document.

The term_vector setting accepts:

no: No term vectors are stored. (default)

yes: Just the terms in the field are stored

with_positions: Terms and positions are stored

with_offsets: Terms and character offsets are stored

with_positions_offsets: Terms, positions, and character offsets are stored

Elasticsearch：如何存储术语向量

Elasticsearch: How to store term vectors

elasticsearch

morelikethis