R - 将因子的参考水平设置为 NA
R - set reference level of factor to NA
我有一个 data.table,其中一些值为 NA 的因子列。我特意将 NA 作为因子水平(即 x <- factor(x, exclude=NULL)
,而不是 x <- factor(x, exclude=NA)
的默认行为),因为 NA 对我的模型有意义。对于这些因子列,我希望 relevel()
参考水平为 NA,但我在语法上苦苦挣扎。
# silly reproducible example
library(data.table)
a <- data.table(animal = c("turkey","platypus","dolphin"),
mass_kg = c(8, 2, 200),
egg_size= c("large","small",NA),
intelligent=c(0,0,1)
)
lr <- glm(intelligent ~ mass_kg + egg_size, data=a, family = binomial)
summary(lr)
# By default, egg_size is converted to a factor with no level for NA
# However, in this case NA is meaningful (since most mammals don't lay eggs)
a[,egg_size:=factor(egg_size, exclude=NULL) ] # exclude=NULL allows an NA level
lr <- glm(intelligent ~ mass_kg + egg_size, data=a, family = binomial)
summary(lr) # Now NA is included in the model, but not as the reference level
a[,levels(egg_size)] # Returns: [1] "large" "small" NA
a[,egg_size:=relevel(egg_size,ref=NA)]
# Returns:
# Error in relevel.factor(egg_size, ref = NA) :
# 'ref' must be an existing level
relevel()
的正确语法是什么,还是我需要使用其他语法?非常感谢。
您必须指定正确的 NA
类型,即 NA_character_
,但这样会抛出 NA
,这可能是一个错误。解决方法是自己直接指定级别:
# throw out NA's to begin with
egg_size = factor(c("large","small",NA), exclude = NA)
# but then add them back at the beginning
factor(egg_size, c(NA, levels(egg_size)), exclude = NULL)
#[1] large small <NA>
#Levels: <NA> large small
如果您想知道,c
将 NA
从 logical
.
转换为正确的类型
我有一个 data.table,其中一些值为 NA 的因子列。我特意将 NA 作为因子水平(即 x <- factor(x, exclude=NULL)
,而不是 x <- factor(x, exclude=NA)
的默认行为),因为 NA 对我的模型有意义。对于这些因子列,我希望 relevel()
参考水平为 NA,但我在语法上苦苦挣扎。
# silly reproducible example
library(data.table)
a <- data.table(animal = c("turkey","platypus","dolphin"),
mass_kg = c(8, 2, 200),
egg_size= c("large","small",NA),
intelligent=c(0,0,1)
)
lr <- glm(intelligent ~ mass_kg + egg_size, data=a, family = binomial)
summary(lr)
# By default, egg_size is converted to a factor with no level for NA
# However, in this case NA is meaningful (since most mammals don't lay eggs)
a[,egg_size:=factor(egg_size, exclude=NULL) ] # exclude=NULL allows an NA level
lr <- glm(intelligent ~ mass_kg + egg_size, data=a, family = binomial)
summary(lr) # Now NA is included in the model, but not as the reference level
a[,levels(egg_size)] # Returns: [1] "large" "small" NA
a[,egg_size:=relevel(egg_size,ref=NA)]
# Returns:
# Error in relevel.factor(egg_size, ref = NA) :
# 'ref' must be an existing level
relevel()
的正确语法是什么,还是我需要使用其他语法?非常感谢。
您必须指定正确的 NA
类型,即 NA_character_
,但这样会抛出 NA
,这可能是一个错误。解决方法是自己直接指定级别:
# throw out NA's to begin with
egg_size = factor(c("large","small",NA), exclude = NA)
# but then add them back at the beginning
factor(egg_size, c(NA, levels(egg_size)), exclude = NULL)
#[1] large small <NA>
#Levels: <NA> large small
如果您想知道,c
将 NA
从 logical
.