关于 R 中 str_extract 中语义的信息？（举个例子）

Question

我想了解如何在 R 的 stringr 包中使用 str_extract 中的语义。

我有这样写的字符串 11_3_S11.html" 我想从它们中提取第一个下划线后的值。我的意思是，我想删除号码 3.

files = c("11_3_S11.html")

如果有人能解释其中的逻辑或向我发送包含所有语义的 link，我将不胜感激。

感谢您的宝贵时间

Answer 1

使用环顾四周。

str_extract("11_3_S11.html", '(?<=_)\d(?=_)')
[1] "3"

Answer 2

在 base R 中，您可以使用 sub 在第一个下划线后提取数字。

sub('\d+_(\d+)_.*', '\1', files)
#[1] "3"

其中 \d+ 指的是 1 个或多个数字。

() 称为捕获组以捕获我们感兴趣的值。

如果您想使用 stringr，您可以在 str_match 中使用相同的正则表达式。

stringr::str_match(files, '\d+_(\d+)_.*')[, 2]
[1] "3"

Info about the semantics in str_extract in R? (With an example)