如何从一列创建多列,可能使用 dcast 或 tidyverse

How to create multiple columns from one column, maybe using dcast or tidyverse

我正在学习 R 并试图找出拆分列的方法。我希望以宽格式从单个列传播我的数据。有人告诉我使用 dcast,但我还没有找到最好的方法,并打算尝试通过 tidyverse 进行管道传输。

# sample data
> data <- data.frame(trimesterPeriod = c(first, second, third, PP, third, second, PP, first )
# dataframe 
  trimesterPeriod 
1 first
2 second
3 third
4 PP
5 third
6 second
7 PP
8 first

and i would it to look like this:

#dataframe
ID     first       second       third       PP
1        1            0           0         0
2        0            1           0         0 
3        0            0           1         0
4        0            0           0         1 
5        0            0           1         0 
6        0            1           0         0 
7        0            0           0         1
8        1            0           0         0 

我知道我将不得不更改角色的 trimesterPeriod 数据,但从那时起我不确定去哪里。我想做的是:

data.frame %>%
    mutate(rn = row_number(first, second, third, PP)) %>%
    spread(trimesterPeriod) %>%
    select(-rn)

但我不确定。非常感谢任何建议!

我们可以使用 base R

中的 table
table(seq_len(nrow(data)), data$trimesterPeriod)

-输出

    first PP second third
  1     1  0      0     0
  2     0  0      1     0
  3     0  0      0     1
  4     0  1      0     0
  5     0  0      0     1
  6     0  0      1     0
  7     0  1      0     0
  8     1  0      0     0

或使用tidyverse

library(dplyr)
library(tidyr)
 data %>% 
   mutate(ID = row_number()) %>%
   pivot_wider(names_from = trimesterPeriod, 
     values_from = trimesterPeriod, values_fn = length, 
        values_fill = 0)

-输出

# A tibble: 8 × 5
     ID first second third    PP
  <int> <int>  <int> <int> <int>
1     1     1      0     0     0
2     2     0      1     0     0
3     3     0      0     1     0
4     4     0      0     0     1
5     5     0      0     1     0
6     6     0      1     0     0
7     7     0      0     0     1
8     8     1      0     0     0

数据

data <- structure(list(trimesterPeriod = c("first", "second", "third", 
"PP", "third", "second", "PP", "first")),
 class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

使用 dcast 来自 data.table -

library(data.table)

dcast(setDT(data), seq_len(nrow(data)) ~ trimesterPeriod, 
      value.var = 'trimesterPeriod', fun.aggregate = length)

#   data PP first second third
#1:    1  0     1      0     0
#2:    2  0     0      1     0
#3:    3  0     0      0     1
#4:    4  1     0      0     0
#5:    5  0     0      0     1
#6:    6  0     0      1     0
#7:    7  1     0      0     0
#8:    8  0     1      0     0