如何在没有可用分隔符的情况下从 mmddyyyy 字符串中分离年份?

How to isolate year from mmddyyyy string with no available delimiter?

有没有办法从这些字符串中获取日期?我只想隔离年份(例如,2019、2020、2021)

例如:USP_03182019_H13

一个简洁友好的答案将是理想的。

date <- c("USP_03182019_H13","DED_03212019_H1","EL_03202019_H8","EL_10082020_H6","DSP_05122021_H5")

#              date
#1 USP_03182019_H13
#2  DED_03212019_H1
#3   EL_03202019_H8
#4   EL_10082020_H6
#5  DSP_05122021_H5

我确定有一种基于正则表达式的方法,但这可以做到...

library(magrittr)
date %>% readr::parse_number %>% substr(., nchar(.)-3, nchar(.))

替代 Bens 解决方案:

library(stringi)
stri_sub(date, stri_locate_last_regex(date, "\d{4}"))

输出:

[1] "2019" "2019" "2019" "2020" "2021"
library(lubridate)
year(mdy(parse_number(date)))
[1] 2019 2019 2019 2020 2021

sub('.*(\d{4})_.*', '\1', date)
[1] "2019" "2019" "2019" "2020" "2021"

stringr::str_extract(date, '\d{4}(?=_)')
[1] "2019" "2019" "2019" "2020" "2021"

一个gsub解决方案

gsub(".*_[[:digit:]]{4}|_.*","",date)
[1] "2019" "2019" "2019" "2020" "2021"

使用 stringrdplyr,因为您要求 tidyr 解决方案。不像一个衬里那么整洁,但希望对于非正则表达式专家(像我)来说很简单。

get_date = function(x) {
    numbers = str_split(x, "_", simplify=T)[,2]
    unlist(str_extract_all(numbers, ".{4}$"))
}

dat %>%
    mutate(date = get_date(date))
  date
1 2019
2 2019
3 2019
4 2020
5 2021

另一种方法是拆分字符向量,select第二个元素,然后提取年份。

substr(sapply(strsplit(date, split = '_'), "[[", 2), 5, 9)
#"2019" "2019" "2019" "2020" "2021"