如何从 R 中的字符串中删除特定长度的数字模式
How do I remove numeric patterns of a certain length from a string in R
假设我有字符串 -
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
我想删除 7 个字符长、8 个字符长和 4 个字符长的数字模式,除非它是 1000。所以基本上我想要以下结果 -
"this is a string with some numbers 1000"
在这里使用 gsub
和正则表达式模式 \b(?:\d{7,8}|(?!1000\b)\d{4})\b
:
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
output <- gsub("\b(?:\d{7,8}|(?!1000\b)\d{4})\b", "", some_string, perl=TRUE)
output
[1] "this is a string with some numbers 1000 "
实际上,整理松散空白的更好版本应该是这样的:
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
output <- gsub("\s*(?:\d{7,8}|(?!1000\b)\d{4})\s*", " ", some_string, perl=TRUE)
output <- gsub("^\s+|\s+$", "", gsub("\s{2,}", " ", output))
output
[1] "this is a string with some numbers 1000"
保留 1000 和 4,7 和 8 以外的长度的 stringr 选项。(示例数据中包含长度 5 之一。)
library(stringr)
"this is a string with some numbers 9639998 21057535 1000 2021 20022 2022" |>
str_remove_all("(?!1000)\b(\d{7,8}|\d{4})\b") |>
str_squish()
#> [1] "this is a string with some numbers 1000 20022"
由 reprex package (v2.0.1)
于 2022-05-17 创建
假设我有字符串 -
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
我想删除 7 个字符长、8 个字符长和 4 个字符长的数字模式,除非它是 1000。所以基本上我想要以下结果 -
"this is a string with some numbers 1000"
在这里使用 gsub
和正则表达式模式 \b(?:\d{7,8}|(?!1000\b)\d{4})\b
:
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
output <- gsub("\b(?:\d{7,8}|(?!1000\b)\d{4})\b", "", some_string, perl=TRUE)
output
[1] "this is a string with some numbers 1000 "
实际上,整理松散空白的更好版本应该是这样的:
some_string <- "this is a string with some numbers 9639998 21057535 1000 2021 2022"
output <- gsub("\s*(?:\d{7,8}|(?!1000\b)\d{4})\s*", " ", some_string, perl=TRUE)
output <- gsub("^\s+|\s+$", "", gsub("\s{2,}", " ", output))
output
[1] "this is a string with some numbers 1000"
保留 1000 和 4,7 和 8 以外的长度的 stringr 选项。(示例数据中包含长度 5 之一。)
library(stringr)
"this is a string with some numbers 9639998 21057535 1000 2021 20022 2022" |>
str_remove_all("(?!1000)\b(\d{7,8}|\d{4})\b") |>
str_squish()
#> [1] "this is a string with some numbers 1000 20022"
由 reprex package (v2.0.1)
于 2022-05-17 创建