尽管使用不捕获命令,字符仍然被捕获和突出显示
Characters are still being captured and highlighted despite using do not capture command
我在 R 中使用正则表达式,我正在尝试捕获具有 1-2 位数字和 0-2 位小数的心脏壁厚度的特定测量值,如:
"maximum thickness of lv wall= 1.5"
但我想排除 (after|myectomy|resection)
在单词 "thickness"
之后某处的情况
所以我写了下面的正则表达式代码:
pattern <- "(?i)(?<=thickness)(?!(\s{0,10}[[:alpha:]]{1,100}){0,8}\s{0,10}(after|myectomy|resection))(?:(?:\s{0,10}[[:alpha:]]{0,100}){0,8}\s{0,10}[:=\(]?)\d{1,3}\.?\d{0,3}"
您可以针对此示例数据框对其进行测试(此示例中的每个测量值都应匹配,最后一个除外):
df <- tibble(
test = c("maximum size of thickness in base to mid of anteroseptal wall(1.7cm)",
"(anterolateral and inferoseptal wall thickness:1.6cm)",
"hypertrophy in apical segments maximom thickness=1.6cm with sparing of posterior wall",
"septal thickness=1cm",
"LV apical segments with maximal thickness 1.7 cm and dynamic",
"septal thickness after myectomy=1cm")
)
此正则表达式代码适用于匹配我想要的内容;问题是这里我只想捕获测量值,但测量值后面的部分也被捕获,尽管我已经通过 none-捕获组 ?:
.
另有说明
检查此图像是 stringr::str_view(df$test, pattern)
的结果:
你可以使用
pattern <- "(?i)(?<=\bthickness(?:\s{1,10}(?!(?:after|myectomy|resection)\b)[a-zA-Z]{1,100}){0,8}\s{0,10}[:=(]?)\d{1,3}(?:\.\d{1,3})?"
str_view(df$test, pattern)
输出:
请参阅 regex demo(JavaScript 现代浏览器中的引擎支持无限长度的回溯)。
详情:
(?<=
- 正面回顾的开始,需要以下模式序列立即匹配到当前位置的左侧:
\bthickness
- 整个单词 thickness
(?:\s{1,10}(?!(?:after|myectomy|resection)\b)[a-zA-Z]{1,100}){0,8}
- 出现零到八次
\s{1,10}
- 一到十个空格
(?!(?:after|myectomy|resection)\b)
- 不允许 after
、mectomy
和 resection
单词紧靠当前位置的右侧
[a-zA-Z]{1,100}
- 1 到 100 个 ASCII 字母
\s{0,10}
- 零到十个空格
[:=(]?
- 可选的 :
、=
或 (
char
)
- 正回顾结束
\d{1,3}
- 一到三位数
(?:\.\d{1,3})?
- .
和一到三位数字的可选序列。
也许您可以将其分解为更小的步骤。
tt <- regmatches(s, regexpr("(?i)thickness.*?\d{1,3}(\.\d{1,3})?", s, perl = TRUE))
is.na(tt) <- grep("after|myectomy|resection", tt)
sub("[^0-9]*", "", tt)
#[1] "1.7" "1.6" "1.6" "1" "1.7" NA
数据:
s <- c("maximum size of thickness in base to mid of anteroseptal wall(1.7cm)",
"(anterolateral and inferoseptal wall thickness:1.6cm)",
"hypertrophy in apical segments maximom thickness=1.6cm with sparing of posterior wall",
"septal thickness=1cm",
"LV apical segments with maximal thickness 1.7 cm and dynamic",
"septal thickness after myectomy=1cm")
我在 R 中使用正则表达式,我正在尝试捕获具有 1-2 位数字和 0-2 位小数的心脏壁厚度的特定测量值,如:
"maximum thickness of lv wall= 1.5"
但我想排除 (after|myectomy|resection)
在单词 "thickness"
所以我写了下面的正则表达式代码:
pattern <- "(?i)(?<=thickness)(?!(\s{0,10}[[:alpha:]]{1,100}){0,8}\s{0,10}(after|myectomy|resection))(?:(?:\s{0,10}[[:alpha:]]{0,100}){0,8}\s{0,10}[:=\(]?)\d{1,3}\.?\d{0,3}"
您可以针对此示例数据框对其进行测试(此示例中的每个测量值都应匹配,最后一个除外):
df <- tibble(
test = c("maximum size of thickness in base to mid of anteroseptal wall(1.7cm)",
"(anterolateral and inferoseptal wall thickness:1.6cm)",
"hypertrophy in apical segments maximom thickness=1.6cm with sparing of posterior wall",
"septal thickness=1cm",
"LV apical segments with maximal thickness 1.7 cm and dynamic",
"septal thickness after myectomy=1cm")
)
此正则表达式代码适用于匹配我想要的内容;问题是这里我只想捕获测量值,但测量值后面的部分也被捕获,尽管我已经通过 none-捕获组 ?:
.
检查此图像是 stringr::str_view(df$test, pattern)
的结果:
你可以使用
pattern <- "(?i)(?<=\bthickness(?:\s{1,10}(?!(?:after|myectomy|resection)\b)[a-zA-Z]{1,100}){0,8}\s{0,10}[:=(]?)\d{1,3}(?:\.\d{1,3})?"
str_view(df$test, pattern)
输出:
请参阅 regex demo(JavaScript 现代浏览器中的引擎支持无限长度的回溯)。
详情:
(?<=
- 正面回顾的开始,需要以下模式序列立即匹配到当前位置的左侧:\bthickness
- 整个单词thickness
(?:\s{1,10}(?!(?:after|myectomy|resection)\b)[a-zA-Z]{1,100}){0,8}
- 出现零到八次\s{1,10}
- 一到十个空格(?!(?:after|myectomy|resection)\b)
- 不允许after
、mectomy
和resection
单词紧靠当前位置的右侧[a-zA-Z]{1,100}
- 1 到 100 个 ASCII 字母
\s{0,10}
- 零到十个空格[:=(]?
- 可选的:
、=
或(
char
)
- 正回顾结束\d{1,3}
- 一到三位数(?:\.\d{1,3})?
-.
和一到三位数字的可选序列。
也许您可以将其分解为更小的步骤。
tt <- regmatches(s, regexpr("(?i)thickness.*?\d{1,3}(\.\d{1,3})?", s, perl = TRUE))
is.na(tt) <- grep("after|myectomy|resection", tt)
sub("[^0-9]*", "", tt)
#[1] "1.7" "1.6" "1.6" "1" "1.7" NA
数据:
s <- c("maximum size of thickness in base to mid of anteroseptal wall(1.7cm)",
"(anterolateral and inferoseptal wall thickness:1.6cm)",
"hypertrophy in apical segments maximom thickness=1.6cm with sparing of posterior wall",
"septal thickness=1cm",
"LV apical segments with maximal thickness 1.7 cm and dynamic",
"septal thickness after myectomy=1cm")