如何在R中提取字符串的最后4位

Question

我想提取给定字符串中的最后 4 位数字，但无法弄清楚。最后 4 位数字可以是 "XXXX" 或 "XXXX-"。最终，我有一个异构条目列表，其中包括单个年份（即 2001 年或 2001 年）、年份列表（即 2001 年、2004 年-）、年份范围（即 2001-2010 年）或这些的组合条目末尾有或没有破折号（“-”）。

我知道在正则表达式中'$'是标识END的token，'^'是用来标识START的。我能够轻松提取前 4 个。这是我能够执行的操作的示例以及不适用于最后 4 位数字的代码：

library(stringr)
test <- c("2009-", "2008-2015", "2001-, 2003-2010, 2012-")
str_extract_all(test, "^[[:digit:]]{4}") # Extracts FIRST 4

[[1]]

[1] "2009" "2008" "2001"

str_extract_all(test, "[[:digit:]]{4}$") # Does not extract LAST 4

[[1]]

character(0)

[[2]]

"2015"

[[3]]

character(0)

str_extract_all(test, "\d{4}$")

[[1]]

character(0)

[[2]]

"2015"

[[3]]

character(0)

我想要的结果是：

[1] "2009" "2015" "2012"

Answer 1

我们可以试试 sub

sub(".*(\d+{4}).*$", "\1", test)
#[1] "2009" "2015" "2012"

如何在R中提取字符串的最后4位

How to extract the last 4 digits of a string of characters in R

regex

r

stringr