因子替换转换为 R 中的字符
Factor replacement converts to character in R
我想将所选列中的 NA 替换为列级别中的最后一个值,但它一直将列转换为字符:
table(sapply(cop2014, class))
factor numeric
400 116
varToCat = c("V21A","A3","Escolari","A17","B8","C5B","RamaEmpPri","C11","C16B",
"C16C","D4B","D4C","RamaEmpSec","RamaUltEmpCesant","G12",
"RamaFuerzaTrab","OcupFuerzaTrab","ActNoMer")
cop2014[,varToCat] = sapply(cop2014[,varToCat],
function(col) replace(col, is.na(col), last(levels(col))))
当我查看变量的 类 时,我可以看到它们发生了变化。
table(sapply(cop2014, class))
character factor numeric
18 382 116
关于为什么会发生这种情况的任何提示?我只想用有效因子替换 NA(在这种情况下是级别上的最后一个)
这是一个用sapply
转换为matrix
的情况,一个matrix
只能容纳一个class。因此,使用 lapply
而不是 sapply
df1[] <- lapply(df1, function(x) replace(x, is.na(x), last(levels(x))))
str(df1)
#'data.frame': 10 obs. of 2 variables:
#$ v1: Factor w/ 3 levels "B","D","E": 1 1 3 2 2 3 1 3 3 1
#$ v2: Factor w/ 5 levels "A","B","C","D",..: 4 3 5 5 2 5 2 1 4 1
如果我们查看 sapply
的输出,它是一个 matrix
,它只能容纳一个 class。在转换为 matrix
的过程中,factor
的属性丢失并转换为 character
sapply(df1, function(x) replace(x, is.na(x), last(levels(x))))
# v1 v2
# [1,] "B" "D"
# [2,] "B" "C"
# [3,] "E" "E"
# [4,] "D" "E"
# [5,] "D" "B"
# [6,] "E" "E"
# [7,] "B" "B"
# [8,] "E" "A"
# [9,] "E" "D"
#[10,] "B" "A"
除了lapply
,我们还可以使用mutate_at
来自tidyverse
library(dplyr)
cop2014 %>%
mutate_at(vars(varToCat), funs(replace(., is.na(.), last(levels(.)))))
数据
f1 <- function(n) sample(c(LETTERS[1:5], NA), n, replace = TRUE)
set.seed(24)
df1 <- data.frame(v1 = f1(10), v2 = f1(10))
我想将所选列中的 NA 替换为列级别中的最后一个值,但它一直将列转换为字符:
table(sapply(cop2014, class))
factor numeric
400 116
varToCat = c("V21A","A3","Escolari","A17","B8","C5B","RamaEmpPri","C11","C16B",
"C16C","D4B","D4C","RamaEmpSec","RamaUltEmpCesant","G12",
"RamaFuerzaTrab","OcupFuerzaTrab","ActNoMer")
cop2014[,varToCat] = sapply(cop2014[,varToCat],
function(col) replace(col, is.na(col), last(levels(col))))
当我查看变量的 类 时,我可以看到它们发生了变化。
table(sapply(cop2014, class))
character factor numeric
18 382 116
关于为什么会发生这种情况的任何提示?我只想用有效因子替换 NA(在这种情况下是级别上的最后一个)
这是一个用sapply
转换为matrix
的情况,一个matrix
只能容纳一个class。因此,使用 lapply
sapply
df1[] <- lapply(df1, function(x) replace(x, is.na(x), last(levels(x))))
str(df1)
#'data.frame': 10 obs. of 2 variables:
#$ v1: Factor w/ 3 levels "B","D","E": 1 1 3 2 2 3 1 3 3 1
#$ v2: Factor w/ 5 levels "A","B","C","D",..: 4 3 5 5 2 5 2 1 4 1
如果我们查看 sapply
的输出,它是一个 matrix
,它只能容纳一个 class。在转换为 matrix
的过程中,factor
的属性丢失并转换为 character
sapply(df1, function(x) replace(x, is.na(x), last(levels(x))))
# v1 v2
# [1,] "B" "D"
# [2,] "B" "C"
# [3,] "E" "E"
# [4,] "D" "E"
# [5,] "D" "B"
# [6,] "E" "E"
# [7,] "B" "B"
# [8,] "E" "A"
# [9,] "E" "D"
#[10,] "B" "A"
除了lapply
,我们还可以使用mutate_at
来自tidyverse
library(dplyr)
cop2014 %>%
mutate_at(vars(varToCat), funs(replace(., is.na(.), last(levels(.)))))
数据
f1 <- function(n) sample(c(LETTERS[1:5], NA), n, replace = TRUE)
set.seed(24)
df1 <- data.frame(v1 = f1(10), v2 = f1(10))