elasticsearch在高频更新一个文档时能保证正确性吗?

can elasticsearch guarantee correctiveness when updating one doc at high frequency?

我一直在做一个涉及大量更新elasticsearch的项目,我发现当更新以高频率应用于单个文档时,无法保证一致性。

对于每次更新,我们都是这样做的(scala 代码)。请注意,我们必须显式删除原始字段并将其替换为新字段,因为 'merge' 不是我们想要的(_update 实际上在 elasticsearch 中合并)。

def replaceFields(alarmId: String, newFields: Map[String, Any]): Future[BulkResponse] = {
def removeField(fieldName: String): UpdateDefinition = {
  log.info("script: " + s"""ctx._source.remove("${fieldName}")""")
  update id alarmId in IndexType script s"""ctx._source.remove("${fieldName}")"""
}

client.execute {
  bulk(
    {newFields.toList.map(ele => removeField(ele._1)) :+
      {update id alarmId in IndexType doc (newFields)}} : _*
  )
}}

不能。您可以将写入仲裁级别增加到所有(请参阅 Undestanding the write_consistency and quorum rule of Elasticsearch for some discussion around this; also see the docs https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-index_.html#index-consistency) and that would get you closer. But Elasticsearch does not have any linearizability guarantees (eg https://aphyr.com/posts/317-jepsen-elasticsearch for examples and https://aphyr.com/posts/313-strong-consistency-models 了解定义),并且不难设计出 ES 不一致的场景。

话虽这么说,但大多数时候它往往是一致的。但是在高更新环境中,您将对 JVM 施加很大的 GC 压力来清理旧文档。我假设您知道更新在 ES 中是如何工作的,但如果您不知道,也值得关注 https://www.elastic.co/guide/en/elasticsearch/reference/current/_updating_documents.html