仅匹配短语的第一次出现
Match only the first occurrence of a phrase
我有以下 Json:
{"field1": "someText",
"field2": "Text Again",
"field3": "Text Again"}
我需要匹配任何以大写字母开头的短语的第一次出现(例如 "Text Again")
我写了以下内容:
("[A-Za-z]+\s[A-Za-z]+")
使用 https://regex101.com/, for instance. However, it does not seem to correctly function as part of the usage of ReplaceTextWithMapping (Apache NiFi) 进行测试时它确实工作正常。正则表达式不正确吗?
感谢您的帮助
描述
:\s*"\s*(?=[A-Z])(?![^"]*?\s[a-z])([A-Za-z\s]+)"
此正则表达式执行以下操作:
- 在看似 JSON 编码的字符串
的值侧找到第一个标题大小写字符串
- 确保每个单词都大写
- returns 引号内的值作为捕获组 1
示例
现场演示
https://regex101.com/r/eO0xW6/1
源字符串
{"field1": "someText",
"field2": "Text again",
"field3": "Text Again"}
第一场比赛
Text Again
说明
总结
:\s*"
验证只检查 JSON 的值侧
\s*
匹配左引号后的任何空格(如果存在)
(?=[A-Z])
确保字符串中的第一个字符是大写
(?![^"]*?\s[a-z])
查找后跟小写字符的任何空格。如果找到,则这不是匹配项
([A-Za-z\s]+)
捕获引号内的所有字符
"
匹配引用
详细
NODE EXPLANATION
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
[^"]*? any character except: '"' (0 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
( group and capture to :
----------------------------------------------------------------------
[A-Za-z\s]+ any character of: 'A' to 'Z', 'a' to
'z', whitespace (\n, \r, \t, \f, and "
") (1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
我已将我对这个问题的发现发布到 Apache NiFi 邮件列表:
我还没有收到社区的任何确认,但在我看来,虽然在这种情况下正则表达式 [A-Z][A-Za-z]*\s[A-Z][A-Za-z]*
是正确的,但处理器 (ReplaceTextWithMapping) 不能很好地处理空白 spaces (\s) 并且字符串在两个单词之间包含 space 。
我有以下 Json:
{"field1": "someText",
"field2": "Text Again",
"field3": "Text Again"}
我需要匹配任何以大写字母开头的短语的第一次出现(例如 "Text Again")
我写了以下内容:
("[A-Za-z]+\s[A-Za-z]+")
使用 https://regex101.com/, for instance. However, it does not seem to correctly function as part of the usage of ReplaceTextWithMapping (Apache NiFi) 进行测试时它确实工作正常。正则表达式不正确吗?
感谢您的帮助
描述
:\s*"\s*(?=[A-Z])(?![^"]*?\s[a-z])([A-Za-z\s]+)"
此正则表达式执行以下操作:
- 在看似 JSON 编码的字符串 的值侧找到第一个标题大小写字符串
- 确保每个单词都大写
- returns 引号内的值作为捕获组 1
示例
现场演示
https://regex101.com/r/eO0xW6/1
源字符串
{"field1": "someText",
"field2": "Text again",
"field3": "Text Again"}
第一场比赛
Text Again
说明
总结
:\s*"
验证只检查 JSON 的值侧
\s*
匹配左引号后的任何空格(如果存在)(?=[A-Z])
确保字符串中的第一个字符是大写(?![^"]*?\s[a-z])
查找后跟小写字符的任何空格。如果找到,则这不是匹配项([A-Za-z\s]+)
捕获引号内的所有字符"
匹配引用
详细
NODE EXPLANATION
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
[^"]*? any character except: '"' (0 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
( group and capture to :
----------------------------------------------------------------------
[A-Za-z\s]+ any character of: 'A' to 'Z', 'a' to
'z', whitespace (\n, \r, \t, \f, and "
") (1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
我已将我对这个问题的发现发布到 Apache NiFi 邮件列表:
我还没有收到社区的任何确认,但在我看来,虽然在这种情况下正则表达式 [A-Z][A-Za-z]*\s[A-Z][A-Za-z]*
是正确的,但处理器 (ReplaceTextWithMapping) 不能很好地处理空白 spaces (\s) 并且字符串在两个单词之间包含 space 。