用于匹配 lucene 中任意 2 个字母 + 任意 6 个数字的正则表达式

Question

我正在为 Elastic Search 中的重要术语聚合编写排除过滤器。我想从结果中排除任何与模式匹配的术语（任何 2 个字母）（任何 6 个数字），即 AB123456

我试过了：

[a-zA-Z]{2}&<0-9>{6}

但是当我尝试更新我的可视化时，Kibana 给出了一个错误

[x_content_parse_exception] [1:72] [significant_terms] exclude doesn't support values of type: START_OBJECT

这个 JavaScript 似乎符合我的要求：

([a-zA-Z]{2}\d{6})

但我正在努力将它翻译成 Lucene

Answer 1

这里不需要“&”，它会尝试在同时，而不是一个接一个。

根据我对问题的理解，这是一个解决方案。它将排除包含 2 个字母后跟 6 个数字的文档：

PUT /Whosebugtest/_doc/1
{
    "value" : "AB123456"
}

PUT /Whosebugtest/_doc/2
{
    "value" : "AB1234Z"
}

PUT /Whosebugtest/_doc/3
{
    "value" : "This document has one at the end: AB123456"
}

POST /Whosebugtest/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "regexp": {
            "value": "[a-z]{2}<0-9>{6}"
          }
        }
      ]
    }
  }
}

这 returns 只有一份文件，一份价值 "AB1234Z" 的文件没有任何标记的 2 个字母后跟 6 个数字。

Answer 2

这是完整的 JSON 我曾经得到我想要的结果。我正在使用 Significant Terms 聚合器从支持票的注释中提取关键字。我需要设置一个 background_filter，然后从我的原始问题中排除文本模式。

文档结构：

摘要：错误消息的名称

注意：错误的详细信息，包括我不关心的用户名，例如 AB123456。

"significant_terms": {
        "field": "notes",
        "size": 10,
        "background_filter": {
          "query_string": {
            "query": "summary: ErrorMessage1* OR ErrorMessage2*",
            "analyze_wildcard": "true"
          }
        },
        "exclude": "[a-zA-Z]{2}[0-9]{6}"
      }

用于匹配 lucene 中任意 2 个字母 + 任意 6 个数字的正则表达式

RegEx for matching any 2 letters + any 6 numbers in lucene

regex

lucene

elasticsearch