R：使用 across() 编码分类数据

Question

我有一个具有字符类型特征的数据集（并非所有特征都是二进制的，其中一个代表一个区域）。

为了避免多次使用该函数，我尝试使用管道和 across() 来识别字符类型的所有列，并使用创建的函数对其进行编码。

encode_ordinal <- function(x, order = unique(x)) {
  x <- as.numeric(factor(x, levels = order, exclude = NULL))
  x
}

dataset <- dataset %>% 
  encode_ordinal(across(where(is.character)))

但是，似乎我没有正确使用 across()，因为我收到错误：

错误：across() 只能在 dplyr 动词中使用。

我想知道我是否过于复杂了，有一种更简单的方法可以实现这一点，即识别字符类型的所有特征并对其进行编码。

Answer 1

您应该在 mutate 中调用 across 和 encode_ordinal，如下例所示：

dataset <- tibble(x = 1:3, y = c('a', 'b', 'b'), z = c('A', 'A', 'B'))
# # A tibble: 3 x 3
#       x y     z    
#   <int> <chr> <chr>
# 1     1 a     A    
# 2     2 b     A    
# 3     3 b     B    

dataset %>%
    mutate(across(where(is.character), encode_ordinal))
# # A tibble: 3 x 3
#       x     y     z
#   <int> <dbl> <dbl>
# 1     1     1     1
# 2     2     2     1
# 3     3     2     2

R：使用 across() 编码分类数据

R: Encoding categorical data using across()

encoding

pipeline

r

categorical-data

dummy-variable