重塑数据以仅产生一行

Reshape the data to result in only a single row

我有一个数据框 (df),格式如下 long/tall

输入:

ID  entity_id  type
A1  1001       husband
A1  1002       wife
A1  1003       brother
A1  1004       son
A2  2005       husband
A2  2006       son

我希望这是宽幅面的,我做了以下操作

因为 Reshape 无法处理重复项(默认为计数)我添加了一个虚拟列

df$dummy <- seq_len(now(df))

df_wide <- dcast(df, dummy + ID ~ type, value.var="entity_id")

这是我得到的:

dummy ID  husband wife  brother son
1     A1  1001    NA    NA      NA
2     A1  NA      1002  NA      NA
3     A1  NA      NA    1003    NA

我想要的:

dummy ID  husband wife brother son
1     A1  1001    1002 1003    1004
2     A2  2005    NA   NA      2006  

EDIT1 SessionINFO()

R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_0.4.1    reshape2_1.4.1 dplyr_0.4.3    RMySQL_0.10.8  DBI_0.3.1     

loaded via a namespace (and not attached):
[1] plyr_1.8.3     magrittr_1.5   R6_2.1.2       assertthat_0.1 parallel_3.2.4 tools_3.2.4    Rcpp_0.12.4    stringi_1.0-1  stringr_1.0.0 

我不确定我是否完全理解您添加虚拟列的原因(我假设您打算为它写 df$dummy 而不是 df_dummy)。但以下似乎给出了您正在寻找的结果:

library(reshape2)

df <- read.delim(text="ID  entity_id  type
                 A1  1001       husband
                 A1  1002       wife
                 A1  1003       brother
                 A1  1004       son
                 A2  2005       husband
                 A2  2006       son", sep="")

dcast(df, ID ~ type, value.var="entity_id")
  ID brother husband  son wife
1 A1    1003    1001 1004 1002
2 A2      NA    2005 2006   NA

编辑:根据您修改后的数据,其中有多个兄弟和儿子,我建议如下(---假设您仍然希望所有内容都排成一排---):

解决方案 1:将所有内容放入一个单元格中:

df <- read.delim(text="ID  entity_id  type
A1  1001       husband
A1  1002       wife
A1  1003       brother
A1  1005       brother
A1  1004       son
A1  1006       son
A2  2005       husband
A2  2006       son", sep="")

dcast(df, ID ~ type, value.var="entity_id", 
      fun.aggregate = function(...) paste0(..., collapse = "_"))
  ID   brother husband       son wife
1 A1 1003_1005    1001 1004_1006 1002
2 A2              2005      2006     

在这里,我通过将它们的 ID 粘贴在一起来聚合多个实例。我不知道你以后想用这个做什么,所以我不知道这对你是否有用table。我只想指出一种可能性。不用说,您可以更改聚合函数以满足您的需要。例如,您可以将它们放入一个列表中,而不是将它们粘贴在一起。

dcast(df, ID ~ type, value.var="entity_id", fun.aggregate = list)
  ID    brother husband        son wife
1 A1 1003, 1005    1001 1004, 1006 1002
2 A2               2005       2006     

方案二:添加列:

library(dplyr)
new.df <- df %>% group_by(ID, type) %>% 
                 mutate(type_num = paste(type, 1:n(), sep="_"))   
dcast(new.df, ID ~ type_num, value.var="entity_id")
  ID brother_1 brother_2 husband_1 son_1 son_2 wife_1
1 A1      1003      1005      1001  1004  1006   1002
2 A2        NA        NA      2005  2006    NA     NA

我的一个重大疏忽,但为了将来像我这样的人的利益。

仅当同一类型有多个条目时才会出现上述问题,即在上面的示例中,我的实际数据看起来像这样

ID  entity_id  type
A1  1001       husband
A1  1002       wife
A1  1003       brother
A1  1005       brother
A1  1004       son
A1  1006       son
A2  2005       husband
A2  2006       son

注意有两个儿子和兄弟:

since 'dcast' can't figure out how to resolve this, it ends up creating another row