在 R 中的多列上转换从长到宽的数据库传播数据
Transform long-to-wide database spreading data on multiple columns in R
我正在努力将长数据框转换为宽数据框,但有些复杂。
我有一列 ID
,其中有重复的条目 - 每个条目都指的是个人的时间点。我还有其他列(即 visit
、var1
和 var2
),其中报告了有关每个时间点的数据。这是一个可重现的例子:
df <- data.frame(ID=c(1,1,1,1,2,2,2,3,3,3),
visit=c(1,4,5,7,1,3,4,2,5,6),
var1=c("AF","no","no","no","AG","AG","no","BA","BA","BA"),
var2=c("good","good","good","bad","good","good","bad","good","good","good"))
并且输出:
ID visit var1 var2
1 1 1 AF good
2 1 4 no good
3 1 5 no good
4 1 7 no bad
5 2 1 AG good
6 2 3 AG good
7 2 4 no bad
8 3 2 BA good
9 3 5 BA good
10 3 6 BA good
我真的需要得到一个数据框,其中每个 ID
只包含一行,每个其他变量包含多个列,例如带有数字后缀(例如 visit_1
, visit_2
, visit_3
等)。
我想到的输出是这样的:
ID visit_1 visit_2 visit_3 visit_4 var1_1 var1_2 var1_3 var1_4 var2_1 var2_2 var2_3 var2_4
1 1 1 4 5 7 AF no no no good good good bad
2 2 1 3 4 NA AG AG no <NA> good good bad <NA>
3 3 2 5 6 NA BA BA BA <NA> good good good <NA>.
其中 visit
、var1
和 var2
列的每个条目基本上都放置在基于 ID
列的单独的顺序列中。
我试过 data.frame::dcast
和 tidyr::spread
,还有 pivot_wider()
,但看起来这些公式最终会根据实际的 值得到多列 而不是生成固定的列。例如,使用 pivot_wider():
df %>% pivot_wider(names_from = ID, values_from = c("visit","var1","var2"))
它 returns 我说了一个错误
Warning messages:
1: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
2: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
3: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
有人可以帮忙吗?
您必须为每个组添加一个顺序 ID:
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(count = row_number()) %>%
pivot_wider(ID, names_from = count, values_from = c(visit, var1, var2))
# A tibble: 3 x 13
# Groups: ID [3]
# ID visit_1 visit_2 visit_3 visit_4 var1_1 var1_2 var1_3 var1_4 var2_1 var2_2 var2_3 var2_4
# <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 1 4 5 7 AF no no no good good good bad
#2 2 1 3 4 NA AG AG no NA good good bad NA
#3 3 2 5 6 NA BA BA BA NA good good good NA
我正在努力将长数据框转换为宽数据框,但有些复杂。
我有一列 ID
,其中有重复的条目 - 每个条目都指的是个人的时间点。我还有其他列(即 visit
、var1
和 var2
),其中报告了有关每个时间点的数据。这是一个可重现的例子:
df <- data.frame(ID=c(1,1,1,1,2,2,2,3,3,3),
visit=c(1,4,5,7,1,3,4,2,5,6),
var1=c("AF","no","no","no","AG","AG","no","BA","BA","BA"),
var2=c("good","good","good","bad","good","good","bad","good","good","good"))
并且输出:
ID visit var1 var2
1 1 1 AF good
2 1 4 no good
3 1 5 no good
4 1 7 no bad
5 2 1 AG good
6 2 3 AG good
7 2 4 no bad
8 3 2 BA good
9 3 5 BA good
10 3 6 BA good
我真的需要得到一个数据框,其中每个 ID
只包含一行,每个其他变量包含多个列,例如带有数字后缀(例如 visit_1
, visit_2
, visit_3
等)。
我想到的输出是这样的:
ID visit_1 visit_2 visit_3 visit_4 var1_1 var1_2 var1_3 var1_4 var2_1 var2_2 var2_3 var2_4
1 1 1 4 5 7 AF no no no good good good bad
2 2 1 3 4 NA AG AG no <NA> good good bad <NA>
3 3 2 5 6 NA BA BA BA <NA> good good good <NA>.
其中 visit
、var1
和 var2
列的每个条目基本上都放置在基于 ID
列的单独的顺序列中。
我试过 data.frame::dcast
和 tidyr::spread
,还有 pivot_wider()
,但看起来这些公式最终会根据实际的 值得到多列 而不是生成固定的列。例如,使用 pivot_wider():
df %>% pivot_wider(names_from = ID, values_from = c("visit","var1","var2"))
它 returns 我说了一个错误
Warning messages:
1: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
2: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
3: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
有人可以帮忙吗?
您必须为每个组添加一个顺序 ID:
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(count = row_number()) %>%
pivot_wider(ID, names_from = count, values_from = c(visit, var1, var2))
# A tibble: 3 x 13
# Groups: ID [3]
# ID visit_1 visit_2 visit_3 visit_4 var1_1 var1_2 var1_3 var1_4 var2_1 var2_2 var2_3 var2_4
# <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 1 4 5 7 AF no no no good good good bad
#2 2 1 3 4 NA AG AG no NA good good bad NA
#3 3 2 5 6 NA BA BA BA NA good good good NA