在数据框中更改 NA-s 更多列
Change NA-s more columns in a dataframe
我有一个数据框(称为 hp),其中包含更多列,其中 NA-s.The 类 这些列是因素。首先,我想将其更改为字符,用 "none" 填充 NA-s 并将其更改回因子。我有 14 列,因此我想用循环来制作它。但是没用。
谢谢你的帮助。
列数:
miss_names<-c("Alley","MasVnrType","FireplaceQu","PoolQC","Fence","MiscFeature","GarageFinish", "GarageQual","GarageCond","BsmtQual","BsmtCond","BsmtExposure","BsmtFinType1",
"BsmtFinType2","Electrical")
循环:
for (i in miss_names){
hp[i]<-as.character(hp[i])
hp[i][is.na(hp[i])]<-"NONE"
hp[i]<-as.factor(hp[i])
print(hp[i])
}
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
使用 addNA()
添加 NA
作为一个因素水平,然后用你想要的任何东西替换那个水平。您不必先将因子转换为字符向量。您可以循环遍历数据框中的所有因素,然后将它们一一替换。
# Sample data
dd <- data.frame(
x = sample(c(NA, letters[1:3]), 20, replace = TRUE),
y = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE)
)
# Loop over the columns
for (i in seq_along(dd)) {
xx <- addNA(dd[, i])
levels(xx) <- c(levels(dd[, i]), "none")
dd[, i] <- xx
}
这给了我们
> str(dd)
'data.frame': 20 obs. of 2 variables:
$ x: Factor w/ 4 levels "a","b","c","none": 1 4 1 4 4 1 4 3 3 3 ...
$ y: Factor w/ 4 levels "A","B","C","none": 1 1 2 2 1 3 3 3 4 1 ...
使用与@Johan Larsson 相同的数据的 purrr 库的替代解决方案:
library(purrr)
set.seed(15)
dd <- data.frame(
x = sample(c(NA, letters[1:3]), 20, replace = TRUE),
y = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE))
# Create a function to convert NA to none
convert.to.none <- function(x){
y <- addNA(x)
levels(y) <- c(levels(x), "none")
x <- y
return(x) }
# use the map function to cycle through dd's columns
map_df(dd, convert.2.none)
允许缩放您的工作。
我有一个数据框(称为 hp),其中包含更多列,其中 NA-s.The 类 这些列是因素。首先,我想将其更改为字符,用 "none" 填充 NA-s 并将其更改回因子。我有 14 列,因此我想用循环来制作它。但是没用。
谢谢你的帮助。
列数:
miss_names<-c("Alley","MasVnrType","FireplaceQu","PoolQC","Fence","MiscFeature","GarageFinish", "GarageQual","GarageCond","BsmtQual","BsmtCond","BsmtExposure","BsmtFinType1",
"BsmtFinType2","Electrical")
循环:
for (i in miss_names){
hp[i]<-as.character(hp[i])
hp[i][is.na(hp[i])]<-"NONE"
hp[i]<-as.factor(hp[i])
print(hp[i])
}
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
使用 addNA()
添加 NA
作为一个因素水平,然后用你想要的任何东西替换那个水平。您不必先将因子转换为字符向量。您可以循环遍历数据框中的所有因素,然后将它们一一替换。
# Sample data
dd <- data.frame(
x = sample(c(NA, letters[1:3]), 20, replace = TRUE),
y = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE)
)
# Loop over the columns
for (i in seq_along(dd)) {
xx <- addNA(dd[, i])
levels(xx) <- c(levels(dd[, i]), "none")
dd[, i] <- xx
}
这给了我们
> str(dd)
'data.frame': 20 obs. of 2 variables:
$ x: Factor w/ 4 levels "a","b","c","none": 1 4 1 4 4 1 4 3 3 3 ...
$ y: Factor w/ 4 levels "A","B","C","none": 1 1 2 2 1 3 3 3 4 1 ...
使用与@Johan Larsson 相同的数据的 purrr 库的替代解决方案:
library(purrr)
set.seed(15)
dd <- data.frame(
x = sample(c(NA, letters[1:3]), 20, replace = TRUE),
y = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE))
# Create a function to convert NA to none
convert.to.none <- function(x){
y <- addNA(x)
levels(y) <- c(levels(x), "none")
x <- y
return(x) }
# use the map function to cycle through dd's columns
map_df(dd, convert.2.none)
允许缩放您的工作。