R 中的 grepl:替换 character/numeric 级别
grepl in R: Replace character/numeric levels
我想用两个级别 DOG 和 CAT 替换我的级别 dog1 ... dog4 和 cat1 ... cat4,但是如果我使用 grepl,我的输出仅作为 NA。
在我的代码中:
x <- (rep(c("dog1","dog2","dog3","dog4","cat1","cat2","cat3","cat4"),2)) #Levels
y<-rnorm(16)
d<-data.frame(cbind(x,y))
head(d)
x y
1 dog1 0.906357739138289
2 dog2 0.974674552504268
3 dog3 0.664045049199848
4 dog4 0.911777985232099
5 cat1 0.246575548162824
6 cat2 0.758069789161901
d$x[grepl("dog", d$x)] <- "DOG"
Warning message: In [<-.factor
(*tmp*
, grepl("dog", d$x), value =
c(NA, NA, NA, : invalid factor level, NA generated
d$x[grepl("cat", d$x)] <- "CAT"
Warning message:
In `[<-.factor`(`*tmp*`, grepl("cat", d$x), value = c(NA_integer_, :
invalid factor level, NA generated
head(d)
x y
1 <NA> 0.906357739138289
2 <NA> 0.974674552504268
3 <NA> 0.664045049199848
4 <NA> 0.911777985232099
5 <NA> 0.246575548162824
6 <NA> 0.758069789161901
如果代码 运行 OK,我想要的输出是:
head(d)
x y
1 DOG 0.906357739138289
2 DOG 0.974674552504268
3 DOG 0.664045049199848
4 DOG 0.911777985232099
5 CAT 0.246575548162824
6 CAT 0.758069789161901
您可以尝试使用字符串作为 false 因素创建数据框:
d <- data.frame(cbind(x,y), stringsAsFactors=FALSE)
d$x[grepl("dog", d$x)] <- "DOG"
d$x[grepl("cat", d$x)] <- "CAT"
这里的关键(正如 Tim 所暗示的那样)是了解 factor
变量虽然表面上相似,但实际上与 character
变量完全不同。
这是访问和更新因子水平的一种方法:
levels(d$x)
# [1] "cat1" "cat2" "cat3" "cat4" "dog1" "dog2" "dog3" "dog4"
levels(d$x)[grepl("dog", levels(d$x))] <- "DOG"
levels(d$x)[grepl("cat", levels(d$x))] <- "CAT"
head(d)
# x y
# 1 DOG -0.0489713202962167
# 2 DOG -0.548503649991368
# 3 DOG 0.460493884654479
# 4 DOG 0.143044665735075
# 5 CAT -2.13008189672678
# 6 CAT -0.136767747543626
levels(d$x)
[1] "CAT" "DOG"
另一个版本,但这里使用正则表达式。我们捕获所有内容,直到找到一个数字并将其转为大写。 (\U
).
d$x <- sub("(.*)\d+", "\U\1", d$x, perl = TRUE)
d$x
#[1] "DOG" "DOG" "DOG" "DOG" "CAT" "CAT" "CAT" "CAT" "DOG" "DOG" "DOG" "DOG"
# "CAT" "CAT" "CAT" "CAT"
我想用两个级别 DOG 和 CAT 替换我的级别 dog1 ... dog4 和 cat1 ... cat4,但是如果我使用 grepl,我的输出仅作为 NA。
在我的代码中:
x <- (rep(c("dog1","dog2","dog3","dog4","cat1","cat2","cat3","cat4"),2)) #Levels
y<-rnorm(16)
d<-data.frame(cbind(x,y))
head(d)
x y
1 dog1 0.906357739138289
2 dog2 0.974674552504268
3 dog3 0.664045049199848
4 dog4 0.911777985232099
5 cat1 0.246575548162824
6 cat2 0.758069789161901
d$x[grepl("dog", d$x)] <- "DOG"
Warning message: In
[<-.factor
(*tmp*
, grepl("dog", d$x), value = c(NA, NA, NA, : invalid factor level, NA generated
d$x[grepl("cat", d$x)] <- "CAT"
Warning message:
In `[<-.factor`(`*tmp*`, grepl("cat", d$x), value = c(NA_integer_, :
invalid factor level, NA generated
head(d)
x y
1 <NA> 0.906357739138289
2 <NA> 0.974674552504268
3 <NA> 0.664045049199848
4 <NA> 0.911777985232099
5 <NA> 0.246575548162824
6 <NA> 0.758069789161901
如果代码 运行 OK,我想要的输出是:
head(d)
x y
1 DOG 0.906357739138289
2 DOG 0.974674552504268
3 DOG 0.664045049199848
4 DOG 0.911777985232099
5 CAT 0.246575548162824
6 CAT 0.758069789161901
您可以尝试使用字符串作为 false 因素创建数据框:
d <- data.frame(cbind(x,y), stringsAsFactors=FALSE)
d$x[grepl("dog", d$x)] <- "DOG"
d$x[grepl("cat", d$x)] <- "CAT"
这里的关键(正如 Tim 所暗示的那样)是了解 factor
变量虽然表面上相似,但实际上与 character
变量完全不同。
这是访问和更新因子水平的一种方法:
levels(d$x)
# [1] "cat1" "cat2" "cat3" "cat4" "dog1" "dog2" "dog3" "dog4"
levels(d$x)[grepl("dog", levels(d$x))] <- "DOG"
levels(d$x)[grepl("cat", levels(d$x))] <- "CAT"
head(d)
# x y
# 1 DOG -0.0489713202962167
# 2 DOG -0.548503649991368
# 3 DOG 0.460493884654479
# 4 DOG 0.143044665735075
# 5 CAT -2.13008189672678
# 6 CAT -0.136767747543626
levels(d$x)
[1] "CAT" "DOG"
另一个版本,但这里使用正则表达式。我们捕获所有内容,直到找到一个数字并将其转为大写。 (\U
).
d$x <- sub("(.*)\d+", "\U\1", d$x, perl = TRUE)
d$x
#[1] "DOG" "DOG" "DOG" "DOG" "CAT" "CAT" "CAT" "CAT" "DOG" "DOG" "DOG" "DOG"
# "CAT" "CAT" "CAT" "CAT"