用分类变量重塑从长到宽的 R

Reshaping long to wide R with categorical variables

我有一个如下所示的数据框,其中包含年份和 ID 标识符,以及许多分类变量(值在下面用大写字母表示):

Year   ID   Var1   Var2  Var3 ...

1996   1    A      A     B
1996   1    A      A     C
1996   2    B      A     D
1998   2    C      C     A
2000   3    D      D     D

我的目标是通过 ID 将其重塑为宽格式,同时提供 ID、年份和值的计数。所以,例如:

ID    Var1_1996_A  Var1_1996_B  Var1_1996_C   Var1_1996_D ...

1     2            0            0             0
2     0            1            0             0
3     0            0            0             0

依此类推,对于每个变量。我是 R 的新手,无法从现有帖子中找到类似的操作(如果重复,我深表歉意)。有谁知道实现此目标的最佳方法是什么?我试过使用 tidyr::pivot_wider,但只能弄清楚如何附加年份,但不能为每个变量创建单独的类别 response

df <- df %>%
    pivot_wider(names_from = year,
                values_from (Var1, Var2, Var3, Var4, Var5)

如果有人能提供一些见解,我们将不胜感激。

先获取长格式的数据:

library(tidyr)

df %>%
  pivot_longer(cols = starts_with('Var')) %>%
  pivot_wider(names_from = c(name, Year, value), values_from = name, 
              values_fn = length, values_fill = 0)

#     ID Var1_1996_A Var2_1996_A Var3_1996_B Var3_1996_C Var1_1996_B Var3_1996_D
#  <int>       <int>       <int>       <int>       <int>       <int>       <int>
#1     1           2           2           1           1           0           0
#2     2           0           1           0           0           1           1
#3     3           0           0           0           0           0           0
# … with 6 more variables: Var1_1998_C <int>, Var2_1998_C <int>,
#   Var3_1998_A <int>, Var1_2000_D <int>, Var2_2000_D <int>, Var3_2000_D <int>

数据

df <- structure(list(Year = c(1996L, 1996L, 1996L, 1998L, 2000L), ID = c(1L, 
1L, 2L, 2L, 3L), Var1 = c("A", "A", "B", "C", "D"), Var2 = c("A", 
"A", "A", "C", "D"), Var3 = c("B", "C", "D", "A", "D")), 
class = "data.frame", row.names = c(NA, -5L))

如果您将使用基础 R:

xtabs(~ID+v, transform(cbind(df[1:2], stack(df, -(1:2))), v = paste(ind, Year, values, sep="_")))

 v
ID  Var1_1996_A Var1_1996_B Var1_1998_C Var1_2000_D Var2_1996_A Var2_1998_C Var2_2000_D Var3_1996_B Var3_1996_C Var3_1996_D Var3_1998_A Var3_2000_D
  1           2           0           0           0           2           0           0           1           1           0           0           0
  2           0           1           1           0           1           1           0           0           0           1           1           0
  3           0           0           0           1           0           0           1           0           0           0           0           1

当然要将其转换为 data.frame,您可以使用:as.data.frame.matrix(...)