将一列中的类别转换为编码为 1 或 0 的多列（如果 R 中存在或不存在）

Question

我的数据如下所示：

library(dplyr)
library(tidyr)
a <- data_frame(type=c("A", "A", "B", "B", "C", "D"))
print(a)
# A tibble: 6 x 1
type 
<chr>
1 A    
2 A    
3 B    
4 B    
5 C    
6 D

其中 type 包含分类信息。我正在尝试将 type 中的每个类别转换为自己的列，如果 type 存在则编码为 1，否则为 0；因此，最终结果如下：

b <- data_frame(A=c(1, 1, 0, 0, 0, 0),
                B=c(0, 0, 1, 1, 0, 0),
                C=c(0, 0, 0, 0, 1, 0),
                D=c(0, 0, 0, 0, 0, 1))

   # A tibble: 6 x 4
     A     B     C     D
   <dbl> <dbl> <dbl> <dbl>
1    1.    0.    0.    0.
2    1.    0.    0.    0.
3    0.    1.    0.    0.
4    0.    1.    0.    0.
5    0.    0.    1.    0.
6    0.    0.    0.    1.

我试过以下方法：

a$dat <- 1
spread(a, type, dat)

但是，它不起作用，因为某些类别有多个实例。任何帮助，将不胜感激。谢谢！

Answer 1

这可能是重复的 -- 您所做的通常称为 "one hot encoding"。一种方法是利用 model.matrix:

library(tidyverse)

a %>% 
  model.matrix(~ . - 1, data = .) %>%
  as_data_frame()

# A tibble: 6 x 4
  typeA typeB typeC typeD
  <dbl> <dbl> <dbl> <dbl>
1     1     0     0     0
2     1     0     0     0
3     0     1     0     0
4     0     1     0     0
5     0     0     1     0
6     0     0     0     1

Answer 2

另一个选项是 table 来自 base R

table(seq_len(nrow(a)), a$type)
#    A B C D
#  1 1 0 0 0
#  2 1 0 0 0
#  3 0 1 0 0
#  4 0 1 0 0
#  5 0 0 1 0
#  6 0 0 0 1

将一列中的类别转换为编码为 1 或 0 的多列（如果 R 中存在或不存在）

Convert categories in one column to multiple columns coded as 1 or 0 if present or absent in R

r

reshape

spread

dataframe