用 "None" 替换具有 NA 因子水平的多个列
Replacing multiple columns with NA Factor Levels with "None"
我正在使用数据集 House Prices:Advanced Regression Techniques,其中包括多个因子变量,其水平之间具有 NA。考虑列 PoolQL、Alley 和 MiscFeatures。我想在一个函数中用 None
替换所有这些 NA
,但我没有这样做。到目前为止试过这个:
MissingLevels <- function(x){
for(i in names(x)){
levels <- levels(x[i])
levels[length(levels) + 1] <- 'None'
x[i] <- factor(x[i], levels = levels)
x[i][is.na(x[i])] <- 'None'
return(x)
}
}
MissingLevels(df[,c('Alley', 'Fence')])
apply(df[,c('Alley', 'Fence')], 2, MissingLevels)
https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data
有几种方法例如:
x <- data.frame(another = 1:3, Alley = c("A", "B", NA), Fence = c("C", NA, NA))
选项 1:使用 forcats
包
x[,c("Alley", "Fence")] <- lapply(x[,c("Alley", "Fence")], fct_explicit_na, na_level = "None")
another Alley Fence
1 1 A C
2 2 B None
3 3 None None
选项 2:
x[,c("Alley", "Fence")] <- lapply(x[,c("Alley", "Fence")], function(x){`levels<-`(addNA(x), c(levels(x), "None"))})
PS:第二个答案灵感来自于@G。格洛腾迪克 post
我正在使用数据集 House Prices:Advanced Regression Techniques,其中包括多个因子变量,其水平之间具有 NA。考虑列 PoolQL、Alley 和 MiscFeatures。我想在一个函数中用 None
替换所有这些 NA
,但我没有这样做。到目前为止试过这个:
MissingLevels <- function(x){
for(i in names(x)){
levels <- levels(x[i])
levels[length(levels) + 1] <- 'None'
x[i] <- factor(x[i], levels = levels)
x[i][is.na(x[i])] <- 'None'
return(x)
}
}
MissingLevels(df[,c('Alley', 'Fence')])
apply(df[,c('Alley', 'Fence')], 2, MissingLevels)
https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data
有几种方法例如:
x <- data.frame(another = 1:3, Alley = c("A", "B", NA), Fence = c("C", NA, NA))
选项 1:使用 forcats
包
x[,c("Alley", "Fence")] <- lapply(x[,c("Alley", "Fence")], fct_explicit_na, na_level = "None")
another Alley Fence
1 1 A C
2 2 B None
3 3 None None
选项 2:
x[,c("Alley", "Fence")] <- lapply(x[,c("Alley", "Fence")], function(x){`levels<-`(addNA(x), c(levels(x), "None"))})
PS:第二个答案灵感来自于@G。格洛腾迪克 post