Python 的 .cat.codes 的 R 等价物是什么，它将分类变量转换为整数水平？

Question

在 python 中，您可以使用 .cat.code 为变量生成分类代码，例如

df['col3'] = df['col3'].astype('category').cat.code

你如何在 R 中做到这一点？

Answer 1

为@Sid29 进一步充实这一点：

python 方法函数 .cat.code 提取因子水平的数字表示。 R 中的等价物是：

a <- factor(c("good", "bad", "good", "bad", "terrible"))

as.numeric(a)
[1] 2 1 2 1 3

请注意，.cat.code 将表示 NA（或 NaN 相同的东西）与 -1 而上述 R 中的解决方案仍然保留 NA 和输出将只是 NA.

编辑：as.numeric(a) 更好。关于 as.numeric 函数中 labels 函数的使用的讨论。请参阅 ?factor 中的警告：

In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

There are some anomalies associated with factors that have NA as a level. It is suggested to use them sparingly, e.g., only for tabulation purposes.

如果您有一个 NA 值，它会将所有值强制转换为 NA，这就是使用 labels 的原因。有趣的是，c(a) 有效（请参阅下面的@42 回答）。

Answer 2

也许做下面的事情更清楚：

# if you want numeric code for every value
a <- factor(c("good", "bad", "good", "bad", "terrible"))
as.integer(a)
# 2 1 2 1 3


# unique labels and the values for them
setNames(levels(a), seq_along(levels(a)))
#    1          2          3 
# "bad"     "good" "terrible"

Python 的 .cat.codes 的 R 等价物是什么，它将分类变量转换为整数水平？

What is the R equivalent for Python's .cat.codes, which converts categorical variable to integer levels?

r

numeric

categorical-data

r-factor