使用 tidyr 或类似工具使 tall 数据集变宽，同时将多个值折叠成一个向量

Question

我有一组来自 Matlab 的数据，我想在 R 中使用它。我有一组主题，以及每个主题中的一组条件。在每种情况下，每个受试者都会产生一些数据。我把它写成 "tall" table，像这样：

    subject   condition   data
#1  id1       cond1       0.12
#2  id1       cond1       0.43
#3  id1       cond2       1.26
#4  id2       cond1       1.96
#5  id2       cond2       0.24
#6  id2       cond2       0.62
...

如您所见，一个问题是每个主题的每个条件中的值数量不同，并且主题中每个条件中的值数量也不相同。我对这些变量在受试者之间的分布很感兴趣，所以我希望在 "wide" 数据框中的列表中保留原始值，如下所示：

    subject   condition   data
#1  id1       cond1       c(0.12, 0.43)
#2  id1       cond2       c(1.26)
#3  id2       cond1       c(1.96)
#4  id2       cond2       c(0.24, 0.62)
...

最好的方法是什么？我过去使用过 tidyr::spread() ，如果没有每行唯一的标识变量，它在这里不起作用，但即使我添加了，我也看不到它是如何工作的。

我也尝试过使用 dplyr::group_by(data, subject, condition)，但我不确定如何从那里继续。是否可以通过使用 c() 作为汇总函数来汇总分组的 table ...？这对我没用。

一如既往，感谢您的帮助！

Answer 1

您可以使用 aggregate() 创建由数值向量组成的 list 列 data。

aggregate(data ~ subject + condition, FUN = list, data = df)
#  subject condition       data
#1     id1     cond1 0.12, 0.43
#2     id2     cond1       1.96
#3     id1     cond2       1.26
#4     id2     cond2 0.24, 0.62

Answer 2

library(dplyr)
library(tidyr)

data = 
"subject   condition   data
id1       cond1       0.12
id1       cond1       0.43
id1       cond2       1.26
id2       cond1       1.96
id2       cond2       0.24
id2       cond2       0.62" %>%
  read.table(text = ., header = TRUE)

对于宽格式：

wide_form = 
  data %>%
  group_by(subject, condition) %>%
  mutate(order = 1:n() %>% paste0("value", .)) %>%
  spread(order, data)

对于嵌套形式：

nested_form = 
  data %>%
  group_by(subject, condition) %>%
  summarize(data = data %>% list)

使用 tidyr 或类似工具使 tall 数据集变宽，同时将多个值折叠成一个向量

Using tidyr or similar to make a tall data set wide, while collapsing multiple values into a vector

r

dplyr

tidyr