删除所有以 R 中的特定字符串结尾的字符串的行 - grepl
remove all rows with a character string ending with a specific string in R - grepl
我想删除所有以“_bundle”结尾的行。我尝试了两种不同的方法,但是 none 如果工作
claimsVolumeSC <- basisPerClaim[!grepl( '$_bundle', basisPerClaim$subcoveragekey),]
levels(claimsVolumeSC$subcoveragekey)
claimsVolumeSC <- basisPerClaim[!grepl( '\>_bundle', basisPerClaim$subcoveragekey),]
levels(claimsVolumeSC$subcoveragekey)
我怎样才能达到我想要的?
为什么到目前为止我尝试的方法不起作用?
> claimsVolumeSC <- basisPerClaim[!grepl( '$_bundle', basisPerClaim$subcoveragekey),]
> levels(claimsVolumeSC$subcoveragekey)
[1] "DA_Chemo" "Daily_cash" "Funeral" "IP_Accommodation" "IP_bundle" "IP_Upgrade" "OP_Dialysis"
[8] "OP_Physio"
> claimsVolumeSC <- basisPerClaim[!grepl( '\>_bundle', basisPerClaim$subcoveragekey),]
> levels(claimsVolumeSC$subcoveragekey)
[1] "DA_Chemo" "Daily_cash" "Funeral" "IP_Accommodation" "IP_bundle" "IP_Upgrade" "OP_Dialysis"
[8] "OP_Physio"
解决方案可能是这个基于正则表达式的解决方案:
可重现数据:
set.seed(123)
df <- data.frame(
Var1 = rnorm(100),
Var2 = sample(c(paste0(LETTERS[1:10],letters[10:18], letters[18:26], letters), paste0(letters[1:10],"bundle")), 100, replace = T),
Var3 = sample(c(paste0(LETTERS[1:10],letters), paste0(letters[1:10],"bundle")), 100, replace = T))
head(df)
Var1 Var2 Var3
1 -0.56047565 Irzi cbundle
2 -0.23017749 ibundle Aa
3 1.55870831 Bmuv cbundle
4 0.07050839 Ijrs abundle
5 0.12928774 Eowo Cw
6 1.71506499 fbundle Hr
解决方法:
这里我们 paste0
所有行放在一起,使用 grepl
匹配所有 bundle
字符串,并从数据帧 which
中减去 (-
) 这些行匹配模式:
df[-which(grepl("bundle", apply(df, 1, paste0, collapse = " "))),]
结果:
如果我们将子集数据帧存储为 df2
,结果是这样的:
df2 <- df[-which(grepl("bundle", apply(df, 1, paste0, collapse = " "))),]
head(df2)
Var1 Var2 Var3
5 0.1292877 Eowo Cw
7 0.4609162 Dnvn Ff
8 -1.2650612 Aksk Aa
9 -0.6868529 Gpxg Gq
10 -0.4456620 Gpxg Hr
11 1.2240818 Hrzr Eo
注意:
如果要匹配的字符串分散在多个列中,则此解决方案很有用。如果匹配仅出现在单个列中,则正常的子集化过程就足够了。
您可以将 subset
与 grepl
一起使用
claimsVolumeSC <- subset(basisPerClaim, !grepl( '_bundle$', subcoveragekey))
如果你不想要数据中的因素,把它们改成字符。
claimsVolumeSC$subcoveragekey <- as.character(claimsVolumeSC$subcoveragekey)
我想删除所有以“_bundle”结尾的行。我尝试了两种不同的方法,但是 none 如果工作
claimsVolumeSC <- basisPerClaim[!grepl( '$_bundle', basisPerClaim$subcoveragekey),]
levels(claimsVolumeSC$subcoveragekey)
claimsVolumeSC <- basisPerClaim[!grepl( '\>_bundle', basisPerClaim$subcoveragekey),]
levels(claimsVolumeSC$subcoveragekey)
我怎样才能达到我想要的? 为什么到目前为止我尝试的方法不起作用?
> claimsVolumeSC <- basisPerClaim[!grepl( '$_bundle', basisPerClaim$subcoveragekey),]
> levels(claimsVolumeSC$subcoveragekey)
[1] "DA_Chemo" "Daily_cash" "Funeral" "IP_Accommodation" "IP_bundle" "IP_Upgrade" "OP_Dialysis"
[8] "OP_Physio"
> claimsVolumeSC <- basisPerClaim[!grepl( '\>_bundle', basisPerClaim$subcoveragekey),]
> levels(claimsVolumeSC$subcoveragekey)
[1] "DA_Chemo" "Daily_cash" "Funeral" "IP_Accommodation" "IP_bundle" "IP_Upgrade" "OP_Dialysis"
[8] "OP_Physio"
解决方案可能是这个基于正则表达式的解决方案:
可重现数据:
set.seed(123)
df <- data.frame(
Var1 = rnorm(100),
Var2 = sample(c(paste0(LETTERS[1:10],letters[10:18], letters[18:26], letters), paste0(letters[1:10],"bundle")), 100, replace = T),
Var3 = sample(c(paste0(LETTERS[1:10],letters), paste0(letters[1:10],"bundle")), 100, replace = T))
head(df)
Var1 Var2 Var3
1 -0.56047565 Irzi cbundle
2 -0.23017749 ibundle Aa
3 1.55870831 Bmuv cbundle
4 0.07050839 Ijrs abundle
5 0.12928774 Eowo Cw
6 1.71506499 fbundle Hr
解决方法:
这里我们 paste0
所有行放在一起,使用 grepl
匹配所有 bundle
字符串,并从数据帧 which
中减去 (-
) 这些行匹配模式:
df[-which(grepl("bundle", apply(df, 1, paste0, collapse = " "))),]
结果:
如果我们将子集数据帧存储为 df2
,结果是这样的:
df2 <- df[-which(grepl("bundle", apply(df, 1, paste0, collapse = " "))),]
head(df2)
Var1 Var2 Var3
5 0.1292877 Eowo Cw
7 0.4609162 Dnvn Ff
8 -1.2650612 Aksk Aa
9 -0.6868529 Gpxg Gq
10 -0.4456620 Gpxg Hr
11 1.2240818 Hrzr Eo
注意:
如果要匹配的字符串分散在多个列中,则此解决方案很有用。如果匹配仅出现在单个列中,则正常的子集化过程就足够了。
您可以将 subset
与 grepl
claimsVolumeSC <- subset(basisPerClaim, !grepl( '_bundle$', subcoveragekey))
如果你不想要数据中的因素,把它们改成字符。
claimsVolumeSC$subcoveragekey <- as.character(claimsVolumeSC$subcoveragekey)