R lookbehind 断言中的正则表达式

Question

我正在尝试使用 tidyr 中的 extract 函数进行一些模式匹配。我已经在正则表达式练习网站上测试了我的正则表达式，该模式似乎有效，而且我使用的是 lookbehind assertion。

我有以下示例文本：

=[\"{ Key = source, Values = web,videoTag,assist }\",\"{ Key = type, 
Values = attack }\",\"{ Key = team, Values = 2 }\",\"{ Key = 
originalStartTimeMs, Values = 56496 }\",\"{ Key = linkId, Values = 
1551292895649 }\",\"{ Key = playerJersey, Values = 8 }\",\"{ Key = 
attackLocationStartX, Values = 3.9375 }\",\"{ Key = 
attackLocationStartY, Values = 0.739376770538243 }\",\"{ Key = 
attackLocationStartDeflected, Values = false }\",\"{ Key = 
attackLocationEndX, Values = 1.7897727272727275 }\",\"{ Key = 
attackLocationEndY, Values = -1.3002832861189795 }\",\"{ Key = 
attackLocationEndDeflected, Values = false }\",\"{ Key = lastModified, 
Values = web,videoTag,assist

我想获取 attackLocationX 之后的数字（所有关于攻击位置的文本之后的数字。

然而，将以下代码与后向断言一起使用，我没有得到任何结果：

df %>% 
extract(message, "x_start",'((?<=attackLocationStartX,/sValues/s=/s)[0- 
9.]+)')

如果未找到模式匹配，此函数将 return NA，并且我的目标列是所有 NA 值，尽管已在 www.regexr.com 上测试了模式。根据文档，R 模式匹配支持后向断言，所以我不确定在这里还能做什么。

Answer 1

我不确定后视部分，但在 R 中，您需要转义反斜杠。如果您使用的不是特定于 R 的正则表达式检查器，这并不明显。

更多信息here。

所以您可能希望您的正则表达式看起来像：

"attackLocationStartX,\sValues\s=\s)[0-9.]+"

Answer 2

首先，要匹配空格，您需要 \s，而不是 /s。

您不必在此处使用回顾，因为如果模式中使用了捕获组，extract 将 return 捕获子字符串。

使用

df %>% 
  extract(message, "x_start", "attackLocationStartX\s*,\s*Values\s*=\s*(-?\d+\.\d+)")

输出：3.9375.

正则表达式也可能看起来像 "attackLocationStartX\s*,\s*Values\s*=\s*(-?\d[.0-9]*)"。

由于捕获了(-?\d+\.\d+)部分，因此只会输出该组中的文本。

图案详情

(-?\d+\.\d+) - 匹配的捕获组
- -? - 一个可选的连字符（? 表示 1 或 0 次出现）
- \d+ - 1个或或数字（+表示1个或更多）
- \. - 一个点
- \d+ - 1 或 or 数字
\d[.0-9]* - 一个数字 (\d)，后跟 0 个或多个点或数字 ([.0-9]*)

R lookbehind 断言中的正则表达式

Regex in R lookbehind assertion

regex

r

lookbehind