仅匹配短语的第一次出现

Question

我有以下 Json:

{"field1": "someText",
  "field2": "Text Again",
  "field3": "Text Again"}

我需要匹配任何以大写字母开头的短语的第一次出现（例如 "Text Again"）

我写了以下内容：

("[A-Za-z]+\s[A-Za-z]+")

使用 https://regex101.com/, for instance. However, it does not seem to correctly function as part of the usage of ReplaceTextWithMapping (Apache NiFi) 进行测试时它确实工作正常。正则表达式不正确吗？

感谢您的帮助

Answer 1

描述

:\s*"\s*(?=[A-Z])(?![^"]*?\s[a-z])([A-Za-z\s]+)"

此正则表达式执行以下操作：

在看似 JSON 编码的字符串
确保每个单词都大写
returns 引号内的值作为捕获组 1

示例

现场演示

https://regex101.com/r/eO0xW6/1

源字符串

{"field1": "someText",
  "field2": "Text again",
  "field3": "Text Again"}

第一场比赛

Text Again

说明

总结

:\s*" 验证只检查 JSON
\s* 匹配左引号后的任何空格（如果存在）
(?=[A-Z]) 确保字符串中的第一个字符是大写
(?![^"]*?\s[a-z]) 查找后跟小写字符的任何空格。如果找到，则这不是匹配项
([A-Za-z\s]+) 捕获引号内的所有字符
" 匹配引用

详细

NODE                     EXPLANATION
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    [A-Z]                    any character of: 'A' to 'Z'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    [^"]*?                   any character except: '"' (0 or more
                             times (matching the least amount
                             possible))
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    [a-z]                    any character of: 'a' to 'z'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (                        group and capture to :
----------------------------------------------------------------------
    [A-Za-z\s]+              any character of: 'A' to 'Z', 'a' to
                             'z', whitespace (\n, \r, \t, \f, and "
                             ") (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of 
----------------------------------------------------------------------
  "                        '"'
----------------------------------------------------------------------

Answer 2

我已将我对这个问题的发现发布到 Apache NiFi 邮件列表：

http://apache-nifi-developer-list.39713.n7.nabble.com/Issues-with-Regex-used-with-ReplaceTextWithMapping-where-am-I-going-wrong-tc10592.html

我还没有收到社区的任何确认，但在我看来，虽然在这种情况下正则表达式 [A-Z][A-Za-z]*\s[A-Z][A-Za-z]* 是正确的，但处理器 (ReplaceTextWithMapping) 不能很好地处理空白 spaces (\s) 并且字符串在两个单词之间包含 space 。

仅匹配短语的第一次出现

Match only the first occurrence of a phrase

regex

apache-nifi

描述

示例

说明