删除 R 中两个特定字母之前的所有前导字符串
Remove all leading string before two specific letters in R
我正在寻找一种方法来删除两个特定字母“bd”和“ls”之前的所有前导字符串。
但是,我只找到了删除白色 space 或标点符号之前的字符串的正则表达式方法。有什么方法可以删除特定字母对之前的前导字符串吗?
date_on location
14 2021-02-22 bradford, west yorkshire, bd9 6dp
15 2021-02-22 bradford, bd4
16 2021-02-22 bradford, west yorkshire
17 2021-02-22 west yorkshire, bd1 1nq
18 2021-02-22 bradford, west yorkshire
19 2021-02-22 ls28 7he
输出:
structure(list(date_on = structure(c(18680, 18680, 18680, 18680,
18680, 18680), class = "Date"), location = c("bradford, west yorkshire, bd9 6dp",
"bradford, bd4", "bradford, west yorkshire", "west yorkshire, bd1 1nq",
"bradford, west yorkshire", "ls28 7he")), row.names = 14:19, class = "data.frame")
预期结果:
date_on location
14 2021-02-22 bd9 6dp
15 2021-02-22 bd4
16 2021-02-22
17 2021-02-22 bd1 1nq
18 2021-02-22
19 2021-02-22 ls28 7he
structure(list(date_on = structure(c(18680, 18680, 18680, 18680,
18680, 18680), class = "Date"), location = c("bd9 6dp",
"bd4", "", "bd1 1nq", "", "ls28 7he")), row.names = 14:19, class = "data.frame")
我们可以在这里尝试使用 sub
作为基础 R 选项:
df$location <- sub("^.*?(\b(?:bd|ls)\d+.*|$)$", "\1", df$location)
df
date_on location
14 2021-02-22 bd9 6dp
15 2021-02-22 bd4
16 2021-02-22
17 2021-02-22 bd1 1nq
18 2021-02-22
19 2021-02-22 ls28 7he
这里是对所用正则表达式模式的解释:
^ from the start of the location
.*? consume all content up to, but not including
( start capture group
\b(?:bd|ls) a postal code starting in 'bd' or 'ls'
\d+ followed by one or more digits
.* consume the remainder of the location
| OR
$ consume the remainder of any location NOT
having at least one postal code
) stop capture group
$ end of the location
另一个基础 R 选项 sub
:
df$location <- sub('.*(?=bd|ls)|.*', '', df$location, perl = TRUE)
df
# date_on location
#14 2021-02-22 bd9 6dp
#15 2021-02-22 bd4
#16 2021-02-22
#17 2021-02-22 bd1 1nq
#18 2021-02-22
#19 2021-02-22 ls28 7he
删除字符串中出现 'bd|ls'
之前的所有内容,如果没有出现,则删除所有内容。
我正在寻找一种方法来删除两个特定字母“bd”和“ls”之前的所有前导字符串。
但是,我只找到了删除白色 space 或标点符号之前的字符串的正则表达式方法。有什么方法可以删除特定字母对之前的前导字符串吗?
date_on location
14 2021-02-22 bradford, west yorkshire, bd9 6dp
15 2021-02-22 bradford, bd4
16 2021-02-22 bradford, west yorkshire
17 2021-02-22 west yorkshire, bd1 1nq
18 2021-02-22 bradford, west yorkshire
19 2021-02-22 ls28 7he
输出:
structure(list(date_on = structure(c(18680, 18680, 18680, 18680,
18680, 18680), class = "Date"), location = c("bradford, west yorkshire, bd9 6dp",
"bradford, bd4", "bradford, west yorkshire", "west yorkshire, bd1 1nq",
"bradford, west yorkshire", "ls28 7he")), row.names = 14:19, class = "data.frame")
预期结果:
date_on location
14 2021-02-22 bd9 6dp
15 2021-02-22 bd4
16 2021-02-22
17 2021-02-22 bd1 1nq
18 2021-02-22
19 2021-02-22 ls28 7he
structure(list(date_on = structure(c(18680, 18680, 18680, 18680,
18680, 18680), class = "Date"), location = c("bd9 6dp",
"bd4", "", "bd1 1nq", "", "ls28 7he")), row.names = 14:19, class = "data.frame")
我们可以在这里尝试使用 sub
作为基础 R 选项:
df$location <- sub("^.*?(\b(?:bd|ls)\d+.*|$)$", "\1", df$location)
df
date_on location
14 2021-02-22 bd9 6dp
15 2021-02-22 bd4
16 2021-02-22
17 2021-02-22 bd1 1nq
18 2021-02-22
19 2021-02-22 ls28 7he
这里是对所用正则表达式模式的解释:
^ from the start of the location
.*? consume all content up to, but not including
( start capture group
\b(?:bd|ls) a postal code starting in 'bd' or 'ls'
\d+ followed by one or more digits
.* consume the remainder of the location
| OR
$ consume the remainder of any location NOT
having at least one postal code
) stop capture group
$ end of the location
另一个基础 R 选项 sub
:
df$location <- sub('.*(?=bd|ls)|.*', '', df$location, perl = TRUE)
df
# date_on location
#14 2021-02-22 bd9 6dp
#15 2021-02-22 bd4
#16 2021-02-22
#17 2021-02-22 bd1 1nq
#18 2021-02-22
#19 2021-02-22 ls28 7he
删除字符串中出现 'bd|ls'
之前的所有内容,如果没有出现,则删除所有内容。