在 R 中格式化和替换单个字符串中的多个日期
Formatting and Replacing Multiple Dates within a Single String in R
我有一个与 非常相似的问题。与我的不同之处在于,我可以在一个字符串中包含多个日期的文本。所有日期的格式都相同,如下所示
rep <- "on the evening of june 11 2022, i was too tired to complete my homework that was due on august 4 2022. on august 25 2022 there will be a test "
我所有的句子都是小写的,所有日期都遵循 %B %d %Y
格式。我可以使用以下代码提取所有日期:
> pattern <- paste(month.name, "[:digit:]{1,2}", "[:digit:]{4}", collapse = "|") %>%
regex(ignore_case = TRUE)
> str_extract_all(rep, pattern)
[[1]]
[1] "june 11 2022" "august 4 2022" "august 25 2022"
我想要做的是将格式为 %B %d %Y
的每个日期实例替换为格式 %Y-%m-%d
。我试过这样的事情:
str_replace_all(rep, pattern, as.character(as.Date(str_extract_all(rep, pattern),format = "%B %d %Y")))
抛出错误 do not know how to convert 'str_extract_all' to class "Date"
。这对我来说很有意义,因为我试图替换多个不同的日期,而 R 不知道用哪个日期替换它。
如果我将 str_extract_all
更改为 str_extract
,我会得到:
"on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-06-11. on 2022-06-11 there will be a test "
这又是有道理的,因为 str_extract 正在获取日期的第一个实例,转换格式,并在所有日期实例中应用相同的日期。
我更希望解决方案使用 stringr
包,因为到目前为止我的大部分字符串整理一直在使用该包,但我 100% 对任何能够完成工作的解决方案持开放态度。
我们可以捕获模式,即一个或多个字符 (\w+
) 后跟一个 space 然后一个或两个数字 (\d{1,2}
),然后是 space然后四个数字 (\d{4}
) 作为一个组 ((...)
) 并在替换中传递一个函数将捕获的组转换为 Date
class
library(stringr)
str_replace_all(rep, "(\w+ \d{1,2} \d{4})", function(x) as.Date(x, "%b %d %Y"))
-输出
[1] "on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-08-04. on 2022-08-25 there will be a test "
注意:最好用不同的名称命名对象,因为 rep
是一个 base R
函数名
您可以将具有多个替换项的命名向量传递给 str_replace_all()
:
library(stringr)
rep <- "on the evening of june 11 2022, i was too tired to complete my homework that was due on august 4 2022. on august 25 2022 there will be a test "
pattern <- paste(month.name, "[:digit:]{1,2}", "[:digit:]{4}", collapse = "|") %>%
regex(ignore_case = TRUE)
extracted <- str_extract_all(rep, pattern)[[1]]
replacements <- setNames(as.character(as.Date(extracted, format = "%B %d %Y")),
extracted)
str_replace_all(rep, replacements)
#> [1] "on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-08-04. on 2022-08-25 there will be a test "
由 reprex package (v2.0.1)
创建于 2022-05-26
我有一个与
rep <- "on the evening of june 11 2022, i was too tired to complete my homework that was due on august 4 2022. on august 25 2022 there will be a test "
我所有的句子都是小写的,所有日期都遵循 %B %d %Y
格式。我可以使用以下代码提取所有日期:
> pattern <- paste(month.name, "[:digit:]{1,2}", "[:digit:]{4}", collapse = "|") %>%
regex(ignore_case = TRUE)
> str_extract_all(rep, pattern)
[[1]]
[1] "june 11 2022" "august 4 2022" "august 25 2022"
我想要做的是将格式为 %B %d %Y
的每个日期实例替换为格式 %Y-%m-%d
。我试过这样的事情:
str_replace_all(rep, pattern, as.character(as.Date(str_extract_all(rep, pattern),format = "%B %d %Y")))
抛出错误 do not know how to convert 'str_extract_all' to class "Date"
。这对我来说很有意义,因为我试图替换多个不同的日期,而 R 不知道用哪个日期替换它。
如果我将 str_extract_all
更改为 str_extract
,我会得到:
"on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-06-11. on 2022-06-11 there will be a test "
这又是有道理的,因为 str_extract 正在获取日期的第一个实例,转换格式,并在所有日期实例中应用相同的日期。
我更希望解决方案使用 stringr
包,因为到目前为止我的大部分字符串整理一直在使用该包,但我 100% 对任何能够完成工作的解决方案持开放态度。
我们可以捕获模式,即一个或多个字符 (\w+
) 后跟一个 space 然后一个或两个数字 (\d{1,2}
),然后是 space然后四个数字 (\d{4}
) 作为一个组 ((...)
) 并在替换中传递一个函数将捕获的组转换为 Date
class
library(stringr)
str_replace_all(rep, "(\w+ \d{1,2} \d{4})", function(x) as.Date(x, "%b %d %Y"))
-输出
[1] "on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-08-04. on 2022-08-25 there will be a test "
注意:最好用不同的名称命名对象,因为 rep
是一个 base R
函数名
您可以将具有多个替换项的命名向量传递给 str_replace_all()
:
library(stringr)
rep <- "on the evening of june 11 2022, i was too tired to complete my homework that was due on august 4 2022. on august 25 2022 there will be a test "
pattern <- paste(month.name, "[:digit:]{1,2}", "[:digit:]{4}", collapse = "|") %>%
regex(ignore_case = TRUE)
extracted <- str_extract_all(rep, pattern)[[1]]
replacements <- setNames(as.character(as.Date(extracted, format = "%B %d %Y")),
extracted)
str_replace_all(rep, replacements)
#> [1] "on the evening of 2022-06-11, i was too tired to complete my homework that was due on 2022-08-04. on 2022-08-25 there will be a test "
由 reprex package (v2.0.1)
创建于 2022-05-26