模式分析器不适用于 elasticsearch 中的 UUID

Question

我正在使用 elasticsearch 版本 7.x 并使用以下映射创建了一个帐户索引。

    curl --location --request PUT 'http://localhost:9200/accounts' \
--header 'Content-Type: application/json' \
--data-raw '{
    "mappings": {
            "properties": {
                "type": {"type": "keyword"},
                "id": {"type": "keyword"},
                "label": {"type": "keyword"},
                "lifestate": {"type": "keyword"},
                "name": {"type": "keyword"},
                "users": {"type": "text"}
            }
    }
}'

并且我将用户存储为一个数组。在我的用例中，一个帐户可以有 n 个用户。所以我将其存储为以下格式。

curl --location --request PUT 'http://localhost:9200/accounts/_doc/account3' \
--header 'Content-Type: application/json' \
--data-raw '{
    "id" : "account_uuid",
    "name" : "Account_Description",
    "users" : [
        "id:6de57db5-8fdb-4a39-ab46-21af623692ea~~status:ACTIVE",
        "id:9611e2be-784f-4a07-b5de-564b3820a660~~status:INACTIVE"
    ]
}'

为了根据用户 ID 及其状态进行搜索，我创建了一个模式分析器，它按 ~~ 符号拆分，如下所示。

curl --location --request PUT 'http://localhost:9200/accounts/_settings' \
--header 'Content-Type: application/json' \
--data-raw '{
  "settings": {
    "analysis": {
      "analyzer": {
        "p_analyzer": { 
          "type": "pattern",
          "pattern" :"~~"
        }
      }
    }
  }
}'

并且搜索查询调用是

curl --location --request GET 'http://localhost:9200/accounts/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
    "query": {
        "bool": {
            "filter": [ 
                { "term": {"id": "account_uuid"} },
                { "match" : {"users" : {
                    "query" : "id:<user_id>",
                    "analyzer" : "p_analyzer"
                }}}
            ]   
        }
    }
}'

如果用户 ID 格式是纯字符串，这确实有效。也就是说，如果用户 id 以非 UUID 格式存储，则效果很好。但它不适用于 UUID 格式的 id。如何让它工作？

Answer 1

修改您的分析器以包含 - hypen，这应该可以解决您的问题，因为它会为 UUID 创建令牌。

{
  "settings": {
    "analysis": {
      "analyzer": {
        "p_analyzer": {
          "type":      "pattern",
          "pattern":   "~~|-",  --> note hypen is included `-`
          "lowercase": true
        }
      }
    }
  }
}

使用上面的分析器生成以下标记

POST/your-index/_analyze

{
  "text" : "6de57db5-8fdb-4a39-ab46-21af623692ea~~status:ACTIVE",
  "analyzer" : "my_email_analyzer"
}

生成的令牌

{
    "tokens": [
        {
            "token": "6de57db5",
            "start_offset": 0,
            "end_offset": 8,
            "type": "word",
            "position": 0
        },
        {
            "token": "8fdb",
            "start_offset": 9,
            "end_offset": 13,
            "type": "word",
            "position": 1
        },
        {
            "token": "4a39",
            "start_offset": 14,
            "end_offset": 18,
            "type": "word",
            "position": 2
        },
        {
            "token": "ab46",
            "start_offset": 19,
            "end_offset": 23,
            "type": "word",
            "position": 3
        },
        {
            "token": "21af623692ea",
            "start_offset": 24,
            "end_offset": 36,
            "type": "word",
            "position": 4
        },
        {
            "token": "status:active",
            "start_offset": 38,
            "end_offset": 51,
            "type": "word",
            "position": 5
        }
    ]
}

现在搜索 6de57db5-8fdb-4a39-ab46-21af623692ea 会将其分解为 6de57db5 、8fdb、4a39 等，并且会匹配索引时生成的标记并进入搜索结果。

模式分析器不适用于 elasticsearch 中的 UUID

Pattern analyzer not works with UUID in elasticsearch

elasticsearch

elasticsearch-analyzers