计算 R 中跨列的因子

Calculate factors across columns in R

我需要将一系列 character 列转换为 factors。然后我需要 factors 跨列映射到相应的枚举值,当它们被转换为类型 numeric.

as.numeric(as.factor(characterColumnDataFrame))

这目前 returns 每列独立分解,因此结果数字与列中的相应字符串不匹配。

想尝试避免转换一列然后从第一列查找和映射枚举。

创建因子时使用levels=DF 有字符列,而 DF2 有因子列都具有相同的水平,levs

# test data frame
DF <- as.data.frame(matrix(letters,, 2), stringsAsFactors = FALSE) 

DF2 <- DF
levs <- sort(unique(unlist(DF)))
DF2[] <- lapply(DF2, factor, levels = levs)

这可以像这样写成一行:

DF2 <- replace(DF, TRUE, lapply(DF, factor, levels = sort(unique(unlist(DF)))))
library(zoo)
test = xtsCharacterObjectWithManyColumns
xts::coredata(test) = as.numeric(factor(test, levels = unique(test), ordered = T))
base::storage.mode(test) = "numeric"

Hadley Wickham 的 forcats 包中的 fct_unify() 函数统一了因子列表中的水平。

# using G. Grothendieck's test data frame
DF <- as.data.frame(matrix(letters,, 2), stringsAsFactors = FALSE)
str(DF)
'data.frame': 13 obs. of  2 variables:
 $ V1: chr  "a" "b" "c" "d" ...
 $ V2: chr  "n" "o" "p" "q" ...
DF[] <- lapply(DF, factor)
str(DF)
'data.frame': 13 obs. of  2 variables:
 $ V1: Factor w/ 13 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ V2: Factor w/ 13 levels "n","o","p","q",..: 1 2 3 4 5 6 7 8 9 10 ...
DF[] <- forcats::fct_unify(DF)
str(DF)
'data.frame': 13 obs. of  2 variables:
 $ V1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ V2: Factor w/ 26 levels "a","b","c","d",..: 14 15 16 17 18 19 20 21 22 23 ...

或作为产生统一因子水平数的单线:

DF[] <- lapply(forcats::fct_unify(lapply(DF, factor)), as.numeric)
DF
   V1 V2
1   1 14
2   2 15
3   3 16
4   4 17
5   5 18
6   6 19
7   7 20
8   8 21
9   9 22
10 10 23
11 11 24
12 12 25
13 13 26