r - 数据帧操作

r - data frame manipulation

假设我有这个数据框:

 df <- data.frame(ID = c("id1", "id1", "id1", "id2", "id2", "id3", "id3", "id3"),
    Code = c("A", "B", "C", "A", "B", "A", "C", "D"),
    Count = c(34,65,21,3,8,12,15,16), Value = c(3,1,8,2,3,3,5,8))

看起来像这样:

df
   ID Code Count Value
1 id1    A    34     3
2 id1    B    65     1
3 id1    C    21     8
4 id2    A     3     2
5 id2    B     8     3
6 id3    A    12     3
7 id3    C    15     5
8 id3    D    16     8

我想获得这个结果数据框:

result <- data.frame(Code = c("A", "B", "C", "D"),
         id1_count = c(34,65,21,NA), id1_value = c(3,1,8,NA), 
         id2_count = c(3, 8, NA, NA), id2_value = c(2, 3, NA, NA), 
         id3_count = c(12,NA,15,16), id3_value = c(3,NA,5,8))

看起来像这样:

> result
  Code id1_count id1_value id2_count id2_value id3_count id3_value
1    A        34         3         3         2        12         3
2    B        65         1         8         3        NA        NA
3    C        21         8        NA        NA        15         5
4    D        NA        NA        NA        NA        16         8

R 基础包中是否有一个可以做到这一点的内衬?我能够获得我需要的结果,但不是以 R 方式(即循环等)。任何帮助表示赞赏。谢谢你。

您可以从 data.table (v1.9.5) 的开发版本中尝试 dcast,它可以包含多个 value.var 列。安装说明是 here

library(data.table)
dcast(setDT(df), Code~ID, value.var=c('Count', 'Value'))
#    Code Count_id1 Count_id2 Count_id3 Value_id1 Value_id2 Value_id3
#1:    A        34         3        12         3         2         3
#2:    B        65         8        NA         1         3        NA
#3:    C        21        NA        15         8        NA         5
#4:    D        NA        NA        16        NA        NA         8

或使用 base R

中的 reshape
reshape(df, idvar='Code', timevar='ID', direction='wide')
#    Code Count.id1 Value.id1 Count.id2 Value.id2 Count.id3 Value.id3
#1    A        34         3         3         2        12         3
#2    B        65         1         8         3        NA        NA
#3    C        21         8        NA        NA        15         5
#8    D        NA        NA        NA        NA        16         8

您也可以试试:

library(tidyr)
library(dplyr)

df %>%
  gather(key, value, -(ID:Code)) %>%
  unite(id_key, ID, key) %>%
  spread(id_key, value)

给出:

#  Code id1_Count id1_Value id2_Count id2_Value id3_Count id3_Value
#1    A        34         3         3         2        12         3
#2    B        65         1         8         3        NA        NA
#3    C        21         8        NA        NA        15         5
#4    D        NA        NA        NA        NA        16         8