其中一个因素的水平是一个空字符串;如何用非缺失值替换它?
One of the factor's levels is an empty string; how to replace it with non-missing value?
数据框 AEbySOC 包含两列 - 具有字符级别的因子 SOC 和整数计数:
> str(AEbySOC)
'data.frame': 19 obs. of 2 variables:
$ SOC : Factor w/ 19 levels "","Blood and lymphatic system disorders",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Count: int 25 50 7 3 1 49 49 2 1 9 ...
其中一个SOC级别为空字符串:
> l = levels(AEbySOC$SOC)
> l[1]
[1] ""
我想用非空字符串替换此级别的值,例如 "Not specified"。这不起作用:
> library(plyr)
> revalue(AEbySOC$SOC, c(""="Not specified"))
Error: attempt to use zero-length variable name
这也不行:
> AEbySOC$SOC[AEbySOC$SOC==""] = "Not specified"
Warning message:
In `[<-.factor`(`*tmp*`, AEbySOC$SOC == "", value = c(NA, 2L, 3L, :
invalid factor level, NA generated
实现这个的正确方法是什么?我感谢任何 input/comment.
levels(AEbySOC$SOC)[1] <- "Not specified"
创建了一个玩具示例:
df<- data.frame(a= c("", "a", "b"))
df
# a
#1
#2 a
#3 b
levels(df$a)
#[1] "" "a" "b"
levels(df$a)[1] <- "Not specified"
levels(df$a)
#[1] "Not specified" "a" "b"
编辑
根据OP的评论,如果我们需要根据值找到它,那么在这种情况下,我们可以尝试
levels(AEbySOC$SOC)[levels(AEbySOC$SOC) == ""] <- "Not specified"
类似的东西应该可以工作:
test <- data.frame(a=c("a", "b", "", " "))
str(test)
which.one <- which( levels(test$a) == "" )
levels(test$a)[which.one] <- "NA"
聚会有点晚了,但这里有一个 tidyverse 解决方案:
library(tidyverse)
df <- data.frame(SOC = c("", "a", "b"))
df <- df %>%
mutate(SOC = fct_recode(SOC, "Not specified" = ""))
这导致:
SOC
1 Not specified
2 a
3 b
数据框 AEbySOC 包含两列 - 具有字符级别的因子 SOC 和整数计数:
> str(AEbySOC)
'data.frame': 19 obs. of 2 variables:
$ SOC : Factor w/ 19 levels "","Blood and lymphatic system disorders",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Count: int 25 50 7 3 1 49 49 2 1 9 ...
其中一个SOC级别为空字符串:
> l = levels(AEbySOC$SOC)
> l[1]
[1] ""
我想用非空字符串替换此级别的值,例如 "Not specified"。这不起作用:
> library(plyr)
> revalue(AEbySOC$SOC, c(""="Not specified"))
Error: attempt to use zero-length variable name
这也不行:
> AEbySOC$SOC[AEbySOC$SOC==""] = "Not specified"
Warning message:
In `[<-.factor`(`*tmp*`, AEbySOC$SOC == "", value = c(NA, 2L, 3L, :
invalid factor level, NA generated
实现这个的正确方法是什么?我感谢任何 input/comment.
levels(AEbySOC$SOC)[1] <- "Not specified"
创建了一个玩具示例:
df<- data.frame(a= c("", "a", "b"))
df
# a
#1
#2 a
#3 b
levels(df$a)
#[1] "" "a" "b"
levels(df$a)[1] <- "Not specified"
levels(df$a)
#[1] "Not specified" "a" "b"
编辑
根据OP的评论,如果我们需要根据值找到它,那么在这种情况下,我们可以尝试
levels(AEbySOC$SOC)[levels(AEbySOC$SOC) == ""] <- "Not specified"
类似的东西应该可以工作:
test <- data.frame(a=c("a", "b", "", " "))
str(test)
which.one <- which( levels(test$a) == "" )
levels(test$a)[which.one] <- "NA"
聚会有点晚了,但这里有一个 tidyverse 解决方案:
library(tidyverse)
df <- data.frame(SOC = c("", "a", "b"))
df <- df %>%
mutate(SOC = fct_recode(SOC, "Not specified" = ""))
这导致:
SOC
1 Not specified
2 a
3 b