从R中的字符串中提取文本

Question

我有很多看起来都很相似的字符串，例如：

x1= "Aaaa_11111_AA_Whatiwant.txt"
x2= "Bbbb_11111_BBBB_Whatiwanttoo.txt"
x3= "Ccc_22222_CC_Whatiwa.txt"

我想在 R 中提取：Whatiwant、Whatiwanttoo 和 Whatiwa。

我从substring(x1,15,23)开始，但我不知道如何概括它。我怎样才能始终提取最后 _ 和 .txt 之间的部分？

谢谢！

Answer 1

您可以使用 regexp 捕获组：

gsub(".*_([^_]*)\.txt","\1",x1)

Answer 2

您也可以将 stringr 库与 str_extract 等函数（以及许多其他可能性）结合使用，以防万一您不了解正则表达式。非常好用

x1= "Aaaa_11111_AA_Whatiwant.txt"
x2= "Bbbb_11111_BBBB_Whatiwanttoo.txt"
x3= "Ccc_22222_CC_Whatiwa.txt"
library(stringr)
patron <- "(What)[a-z]+"
str_extract(x1, patron)
## [1] "Whatiwant"
str_extract(x2, patron)
## [1] "Whatiwanttoo"
str_extract(x3, patron)
## [1] "Whatiwa"

从R中的字符串中提取文本

extract text from string in R

string

r

extract