如何使用 elasticsearch 通过正则表达式从文本中查询电子邮件

How to use elasticsearch to query email from text with regex

我想从存储在 est 中的文本中查询所有电子邮件,现在我使用这个查询条件并得到 query result

"query": {
    "regexp": {
        "sys_content": {
            "value": "[-a-zA-Z0-9_]+(\.[-a-zA-Z0-9_]+)*@[-a-zA-Z0-9_]+(\.[-a-zA-Z0-9_]+)+",
            "flags_value": 65535,
            "max_determinized_states": 10000,
            "boost": 1.0
"highlight": {
    "pre_tags": [
        "<span style='color:red'>"
    "post_tags": [
    "fragment_size": 100,
    "require_field_match": true,
    "fields": {
        "sys_content": {}



这是一个使用 uax url email tokenizer 的解决方案。这将在索引时完成大部分工作,使您的搜索速度更快。

使用自定义分析器创建索引以创建 标记和过滤器以仅保留那些 标记:

PUT test-index
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer",
          "filter": ["extract_email"]
      "tokenizer": {
        "my_tokenizer": {
          "type": "uax_url_email",
          "max_token_length": 50
      "filter": {
        "extract_email": {
          "type": "keep_types",
          "types": [ "<EMAIL>" ]
  "mappings" : {
      "properties" : {
        "sys_content" : {
          "type" : "text",
          "fields": {
            "email": {
              "type": "text",
              "analyzer": "my_analyzer"


POST test-index/_doc
  "sys_content": "test email@gmail.com not@ a@a email another@email.fr"

最后搜索并突出显示电子邮件。多亏了 uax url 电子邮件分词器,查找电子邮件已经在索引时完成,因此在搜索时,您只需匹配 sys_content.email 字段中的任何令牌:

GET test-index/_search
  "query": {
    "regexp": {
      "sys_content.email": {
        "value": ".*",
        "flags": "ALL",
        "case_insensitive": true,
        "max_determinized_states": 10000,
        "rewrite": "constant_score"
  "highlight": {
    "pre_tags": [
        "<span style='color:red'>"
    "post_tags": [
    "fragment_size": 100,
    "require_field_match": true,
    "fields": {
        "sys_content.email": {}


  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    "max_score" : 1.0,
    "hits" : [
        "_index" : "test-index",
        "_type" : "_doc",
        "_id" : "GxSbM3oBJxdf7EzzH4jM",
        "_score" : 1.0,
        "_source" : {
          "sys_content" : "test email@gmail.com not@ a@a email another@email.fr"
        "highlight" : {
          "sys_content.email" : [
            "test <span style='color:red'>email@gmail.com</span> not@ a@a email <span style='color:red'>another@email.fr</span>"
