用gsub替换某些字符串以外的字符
Replace characters except certain strings with gsub
我正在尝试替换不与gsub
函数中的模式匹配的列中的字符。
数据栏:
library(tidyverse)
df <- structure(list(partij_kort = c("COMBGB", "VVD", "GL", "NIEUWEL",
"CDA")), .Names = "partij_kort", row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
partij_kort
<chr>
1 COMBGB
2 VVD
3 GL
4 NIEUWEL
5 CDA
这段代码与我想要的相反:
df %>% mutate(new = gsub("VVD|GL|CDA|CU|D66|PVDA|CUSGP|SGP|PVDAGL",
"something",
partij_kort))
partij_kort new
<chr> <chr>
1 COMBGB COMBGB
2 VVD something
3 GL something
4 NIEUWEL NIEUWEL
5 CDA something
我希望该模式(COMBGB
和 NIEUWEL
)中 不是 的每个字符串都更改为 something
。
但是感叹号 !
不适用于 gsub(我经常在 grepl 中使用它)。
期望的结果:
partij_kort new
<chr> <chr>
1 COMBGB something
2 VVD VVD
3 GL GL
4 NIEUWEL something
5 CDA CDA
最好的方法是什么?
实际上,不需要正则表达式,imo:
library(dplyr)
exceptions <- c("VVD","GL","CDA","CU","D66","PVDA","CUSGP","SGP","PVDAGL")
df %>%
mutate(new = if_else(!(partij_kort %in% exceptions),
"something",
partij_kort))
这会产生
# A tibble: 5 x 2
partij_kort new
<chr> <chr>
1 COMBGB something
2 VVD VVD
3 GL GL
4 NIEUWEL something
5 CDA CDA
您需要在 gsub 中使用 perl=TRUE 并使用正则表达式否定您的选择。
library(tidyverse)
df <- structure(list(partij_kort = c("COMBGB", "VVD", "GL", "NIEUWEL", "CDA", "anything", "good" ,"bad","whtever")),
.Names = "partij_kort",
row.names = c(NA, -9L),
class = c("tbl_df", "tbl", "data.frame"))
df %>% mutate(new = gsub("^((?!(VVD|GL|CDA|CU|D66|PVDA|CUSGP|SGP|PVDAGL)).)*$",
"something", partij_kort, perl = TRUE))
# A tibble: 9 x 2
partij_kort new
<chr> <chr>
1 COMBGB something
2 VVD VVD
3 GL GL
4 NIEUWEL something
5 CDA CDA
6 anything something
7 good something
8 bad something
9 whtever something
谢谢
您还可以将 replace
与 grepl
一起使用,如下所示:
library(tidyverse)
df %>% mutate(new = replace(partij_kort , !grepl("VVD|GL|CDA|CU|D66|PVDA|CUSGP|SGP|PVDAGL",
partij_kort),"something"))
# A tibble: 5 x 2
# partij_kort new
# <chr> <chr>
#1 COMBGB something
#2 VVD VVD
#3 GL GL
#4 NIEUWEL something
#5 CDA CDA
我正在尝试替换不与gsub
函数中的模式匹配的列中的字符。
数据栏:
library(tidyverse)
df <- structure(list(partij_kort = c("COMBGB", "VVD", "GL", "NIEUWEL",
"CDA")), .Names = "partij_kort", row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
partij_kort
<chr>
1 COMBGB
2 VVD
3 GL
4 NIEUWEL
5 CDA
这段代码与我想要的相反:
df %>% mutate(new = gsub("VVD|GL|CDA|CU|D66|PVDA|CUSGP|SGP|PVDAGL",
"something",
partij_kort))
partij_kort new
<chr> <chr>
1 COMBGB COMBGB
2 VVD something
3 GL something
4 NIEUWEL NIEUWEL
5 CDA something
我希望该模式(COMBGB
和 NIEUWEL
)中 不是 的每个字符串都更改为 something
。
但是感叹号 !
不适用于 gsub(我经常在 grepl 中使用它)。
期望的结果:
partij_kort new
<chr> <chr>
1 COMBGB something
2 VVD VVD
3 GL GL
4 NIEUWEL something
5 CDA CDA
最好的方法是什么?
实际上,不需要正则表达式,imo:
library(dplyr)
exceptions <- c("VVD","GL","CDA","CU","D66","PVDA","CUSGP","SGP","PVDAGL")
df %>%
mutate(new = if_else(!(partij_kort %in% exceptions),
"something",
partij_kort))
这会产生
# A tibble: 5 x 2
partij_kort new
<chr> <chr>
1 COMBGB something
2 VVD VVD
3 GL GL
4 NIEUWEL something
5 CDA CDA
您需要在 gsub 中使用 perl=TRUE 并使用正则表达式否定您的选择。
library(tidyverse)
df <- structure(list(partij_kort = c("COMBGB", "VVD", "GL", "NIEUWEL", "CDA", "anything", "good" ,"bad","whtever")),
.Names = "partij_kort",
row.names = c(NA, -9L),
class = c("tbl_df", "tbl", "data.frame"))
df %>% mutate(new = gsub("^((?!(VVD|GL|CDA|CU|D66|PVDA|CUSGP|SGP|PVDAGL)).)*$",
"something", partij_kort, perl = TRUE))
# A tibble: 9 x 2
partij_kort new
<chr> <chr>
1 COMBGB something
2 VVD VVD
3 GL GL
4 NIEUWEL something
5 CDA CDA
6 anything something
7 good something
8 bad something
9 whtever something
谢谢
您还可以将 replace
与 grepl
一起使用,如下所示:
library(tidyverse)
df %>% mutate(new = replace(partij_kort , !grepl("VVD|GL|CDA|CU|D66|PVDA|CUSGP|SGP|PVDAGL",
partij_kort),"something"))
# A tibble: 5 x 2
# partij_kort new
# <chr> <chr>
#1 COMBGB something
#2 VVD VVD
#3 GL GL
#4 NIEUWEL something
#5 CDA CDA