基于 Elastic 搜索中的定界符对字符串进行标记

Tokenize string based on delimiter in Elastic search

我需要使用 | 分隔符将字符串 36-3031.00|36-3021.00 标记为 36-3031.0036-3021.00

我试过这样,

PUT text
{
   "test1": {
  "settings": {
    "analysis" : {
            "tokenizer" : {
                "pipe_tokenizer" : {
                    "type" : "pattern",
                    "pattern" : "|"
                }
            },
            "analyzer" : {
                "pipe_analyzer" : {
                    "type" : "custom",
                    "tokenizer" : "pipe_tokenizer"
                }
            }
        }
  },
  "mappings": {
    "mytype": {
      "properties": {
        "text": {
          "type": "string",
          "analyzer": "pipe_analyzer"
        }
      }
    }
  }
}}

但它并不准确。谁能解决这个用例?

以下是您应该使用的正确映射(包括 REST PUT 命令中的索引名称)。 | 字符需要转义:

DELETE test1
PUT test1
{
  "settings": {
    "analysis": {
      "tokenizer": {
        "pipe_tokenizer": {
          "type": "pattern",
          "pattern": "\|"
        }
      },
      "analyzer": {
        "pipe_analyzer": {
          "type": "custom",
          "tokenizer": "pipe_tokenizer"
        }
      }
    }
  },
  "mappings": {
    "mytype": {
      "properties": {
        "text": {
          "type": "string",
          "analyzer": "pipe_analyzer"
        }
      }
    }
  }
}

POST /test1/mytype/1
{"text":"36-3031.00|36-3021.00"}

GET /test1/_analyze
{"field":"text","text":"36-3031.00|36-3021.00"}