带有 2 个捕获组的正则表达式,"key=value" 或 "value_only"

Regex with 2 capture groups, "key=value" or "value_only"

我正在尝试构建一个匹配 key=valuevalue_only 的正则表达式,其中在 key=value 的情况下,值可能包含 = 符号。键应该进入捕获组 1,值应该进入捕获组 2。R/stringr 中的示例,这是 ICU 引擎。我还没有找到贪婪的、占有欲的和懒惰的量词的任何组合来让它工作。我错过了什么吗?

library(stringr)

data <- c(
  "key1=value1",
  "value_only_no_key",
  "key2=value2=containing=equal=signs"
)

# Desired outcome:
result <- matrix(c(
    "key1", "value1",
    "", "value_only_no_key",
    "key2", "value2=containing=equal=signs"
), ncol=2, byrow= TRUE)



# The non-optionality of = results in no match for #2
str_match(
  data,
  "(.*?)=(.*)"
)[,-1]

# Same here
str_match(
  data,
  "([^=]*?)=(.*)"
)[,-1]

# The optionality of =? lets the greedy capture 2 eat everything
str_match(
  data,
  "(.*?)=?(.*)"
)[,-1]

# This is better than nothing, but the value_no_key ends up in the first match
str_match(
  data,
  "([^=]*+)=?+(.*)"
)[,-1]

如何使用锚定到字符串 ^ 开头的非匹配 (?:) 可选 ? 组?

str_match(data,
          "^(?:(.*?)=)?(.*)"
          )[,-1]
     [,1]   [,2]                           
[1,] "key1" "value1"                       
[2,] NA     "value_only_no_key"            
[3,] "key2" "value2=containing=equal=signs"

如果知道key在等号第一次出现之前,可以使用取反符class来匹配除=

以外的所有字符

如果您不想匹配空字符串并且该值应该至少有一个字符:

^(?:([^\s=]+)=)?(.+)

Regex demo

如果键也可以包含空格,您可以排除匹配换行符而不是空白字符。

^(?:([^\r\n=]+)=)?(.+)

示例

library(stringr)

data <- c(
  "key1=value1",
  "value_only_no_key",
  "key2=value2=containing=equal=signs"
)

str_match(data,
          "^(?:([^\s=]+)=)?(.+)"
)[,-1]

输出

     [,1]   [,2]                           
[1,] "key1" "value1"                       
[2,] NA     "value_only_no_key"            
[3,] "key2" "value2=containing=equal=signs"