是否可以向 Elasticsearch 中的现有索引添加新的相似性度量?

Is it possible to add a new similarity metric to an existing index in Elasticsearch?

假设存在一个具有自定义 BM25 相似性度量的现有索引,如下所示:

{
    "settings": {
        "index": {
            "similarity": {
                "BM25_v1": {
                    "type": "BM25",
                    "b": 1.0
                }
            },
            "number_of_replicas": 0,
            "number_of_shards": 3,
            "refresh_interval": "120s"
        }
    }
}

并且此相似性度量用于两个字段:

{
    'some_field': {
        'type': 'text',
        'norms': 'true',
        'similarity': 'BM25_v1'
    },
    'another_field': {
        'type': 'text',
        'norms': 'true',
        'similarity': 'BM25_v1'
    },
}

现在,我想知道是否可以将另一个相似性度量 (BM25_v2) 添加到同一索引并将这个新度量用于 another_field,如下所示:

"index": {
    "similarity": {
        # The existing metric, not changed.
        "BM25_v1": {
            "type": "BM25",
            "b": 1.0
        },
        # The new similarity metric for this index.
        "BM25_v2": {  
            "type": "BM25",
            "b": 0.0
        }
    }
}

# ... and use the new metric for one of the fields:

{
    'some_field': {
        'type': 'text',
        'norms': 'true',
        'similarity': 'BM25_v1' # This field uses the same old metric.
    },
    'another_field': {
        'type': 'text',
        'norms': 'true',
        'similarity': 'BM25_v2' # The new metric is used for this field.
    },
}

我在 documentation 中找不到此场景的任何示例,所以我不确定这是否可行。

更新:我已经看到这个旧的仍然开放的 issue which concerns with dynamic update of similarity metrics in Elasticsearch. But it is not completely clear from that discussion what is and isn't possible. Also there have been some attempts 用于实现某种程度的相似性更新;但我认为它没有记录(例如,可以更改现有相似性度量的参数,例如 existing [=17= 中的 bk1基于 ] 的指标)。

TLDR;

我相信你做不到。

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Mapper for [title] conflicts with existing mapper:\n\tCannot update parameter [similarity] from [my_similarity] to [my_similarity_v2]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Mapper for [title] conflicts with existing mapper:\n\tCannot update parameter [similarity] from [my_similarity] to [my_similarity_v2]"
  },
  "status" : 400
}

如果您愿意,我相信您将不得不创建一个新字段和 re-index 数据。

重现

PUT /70973345
{
  "settings": {
    "index": {
      "similarity": {
        "my_similarity": {
          "type": "BM25",
          "b": 1.0
        }
      }
    }
  }
}

PUT /70973345/_mapping
{
  "properties" : {
    "title" : { "type" : "text", "similarity" : "my_similarity" }
  }
}

我们插入一些虚拟数据,然后检索它。

POST /70973345/_doc
{
  "title": "I love rock'n roll"
}

POST /70973345/_doc
{
  "title": "I love pasta al'arabita"
}

POST /70973345/_doc
{
  "title": "pasta rock's"
}

GET /70973345/_search?explain=true
{
  "query": {
    "match": {
      "title": "pasta"
    }
  }
}

如果我们尝试更新它的设置 without closing,我们会收到错误消息。

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Can't update non dynamic settings ...."
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Can't update non dynamic settings ...."
  },
  "status" : 400
}

POST /70973345/_close?wait_for_active_shards=0
PUT /70973345/_settings
{
  "index": {
    "similarity": {
      "my_similarity": {
        "type": "BM25",
        "b": 1.0
      },
      "my_similarity_v2": {
        "type": "BM25",
        "b": 0
      }
    }
  }
}

更新工作正常,但是 :

PUT /70973345/_mapping
{
  "properties": {
    "title": {
      "type": "text",
      "similarity": "my_similarity_v2"
    }
  }
}
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Mapper for [title] conflicts with existing mapper:\n\tCannot update parameter [similarity] from [my_similarity] to [my_similarity_v2]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "Mapper for [title] conflicts with existing mapper:\n\tCannot update parameter [similarity] from [my_similarity] to [my_similarity_v2]"
  },
  "status" : 400
}

无论索引的 open/close 状态如何,它都不起作用。

这让我相信这是不可能的。您可能需要 re-index 将现有数据放入新索引中。