在数据框中更改 NA-s 更多列

Question

我有一个数据框（称为 hp），其中包含更多列，其中 NA-s.The 类这些列是因素。首先，我想将其更改为字符，用 "none" 填充 NA-s 并将其更改回因子。我有 14 列，因此我想用循环来制作它。但是没用。

谢谢你的帮助。

列数：

miss_names<-c("Alley","MasVnrType","FireplaceQu","PoolQC","Fence","MiscFeature","GarageFinish",       "GarageQual","GarageCond","BsmtQual","BsmtCond","BsmtExposure","BsmtFinType1",
          "BsmtFinType2","Electrical")

循环：

for (i in miss_names){       
    hp[i]<-as.character(hp[i])
    hp[i][is.na(hp[i])]<-"NONE"
    hp[i]<-as.factor(hp[i])
    print(hp[i])
    }

 Error in sort.list(y) : 'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list?

Answer 1

使用 addNA() 添加 NA 作为一个因素水平，然后用你想要的任何东西替换那个水平。您不必先将因子转换为字符向量。您可以循环遍历数据框中的所有因素，然后将它们一一替换。

# Sample data
dd <- data.frame(
  x = sample(c(NA, letters[1:3]), 20, replace = TRUE),
  y = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE)
)

# Loop over the columns
for (i in seq_along(dd)) {
  xx <- addNA(dd[, i])
  levels(xx) <- c(levels(dd[, i]), "none")
  dd[, i] <- xx
}

这给了我们

> str(dd)
'data.frame':   20 obs. of  2 variables:
 $ x: Factor w/ 4 levels "a","b","c","none": 1 4 1 4 4 1 4 3 3 3 ...
 $ y: Factor w/ 4 levels "A","B","C","none": 1 1 2 2 1 3 3 3 4 1 ...

Answer 2

使用与@Johan Larsson 相同的数据的 purrr 库的替代解决方案：

library(purrr)

set.seed(15)
dd <- data.frame(
        x = sample(c(NA, letters[1:3]), 20, replace = TRUE),
        y = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE))

# Create a function to convert NA to none
convert.to.none <- function(x){
        y <- addNA(x)
        levels(y) <- c(levels(x), "none")
        x <- y
        return(x) }

# use the map function to cycle through dd's columns
map_df(dd, convert.2.none)

允许缩放您的工作。

在数据框中更改 NA-s 更多列

Change NA-s more columns in a dataframe

loops

r

na