重塑数据以仅产生一行
Reshape the data to result in only a single row
我有一个数据框 (df),格式如下 long/tall
输入:
ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1004 son
A2 2005 husband
A2 2006 son
我希望这是宽幅面的,我做了以下操作
因为 Reshape 无法处理重复项(默认为计数)我添加了一个虚拟列
df$dummy <- seq_len(now(df))
df_wide <- dcast(df, dummy + ID ~ type, value.var="entity_id")
这是我得到的:
dummy ID husband wife brother son
1 A1 1001 NA NA NA
2 A1 NA 1002 NA NA
3 A1 NA NA 1003 NA
我想要的:
dummy ID husband wife brother son
1 A1 1001 1002 1003 1004
2 A2 2005 NA NA 2006
EDIT1 SessionINFO()
R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidyr_0.4.1 reshape2_1.4.1 dplyr_0.4.3 RMySQL_0.10.8 DBI_0.3.1
loaded via a namespace (and not attached):
[1] plyr_1.8.3 magrittr_1.5 R6_2.1.2 assertthat_0.1 parallel_3.2.4 tools_3.2.4 Rcpp_0.12.4 stringi_1.0-1 stringr_1.0.0
我不确定我是否完全理解您添加虚拟列的原因(我假设您打算为它写 df$dummy
而不是 df_dummy
)。但以下似乎给出了您正在寻找的结果:
library(reshape2)
df <- read.delim(text="ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1004 son
A2 2005 husband
A2 2006 son", sep="")
dcast(df, ID ~ type, value.var="entity_id")
ID brother husband son wife
1 A1 1003 1001 1004 1002
2 A2 NA 2005 2006 NA
编辑:根据您修改后的数据,其中有多个兄弟和儿子,我建议如下(---假设您仍然希望所有内容都排成一排---):
解决方案 1:将所有内容放入一个单元格中:
df <- read.delim(text="ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1005 brother
A1 1004 son
A1 1006 son
A2 2005 husband
A2 2006 son", sep="")
dcast(df, ID ~ type, value.var="entity_id",
fun.aggregate = function(...) paste0(..., collapse = "_"))
ID brother husband son wife
1 A1 1003_1005 1001 1004_1006 1002
2 A2 2005 2006
在这里,我通过将它们的 ID 粘贴在一起来聚合多个实例。我不知道你以后想用这个做什么,所以我不知道这对你是否有用table。我只想指出一种可能性。不用说,您可以更改聚合函数以满足您的需要。例如,您可以将它们放入一个列表中,而不是将它们粘贴在一起。
dcast(df, ID ~ type, value.var="entity_id", fun.aggregate = list)
ID brother husband son wife
1 A1 1003, 1005 1001 1004, 1006 1002
2 A2 2005 2006
方案二:添加列:
library(dplyr)
new.df <- df %>% group_by(ID, type) %>%
mutate(type_num = paste(type, 1:n(), sep="_"))
dcast(new.df, ID ~ type_num, value.var="entity_id")
ID brother_1 brother_2 husband_1 son_1 son_2 wife_1
1 A1 1003 1005 1001 1004 1006 1002
2 A2 NA NA 2005 2006 NA NA
我的一个重大疏忽,但为了将来像我这样的人的利益。
仅当同一类型有多个条目时才会出现上述问题,即在上面的示例中,我的实际数据看起来像这样
ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1005 brother
A1 1004 son
A1 1006 son
A2 2005 husband
A2 2006 son
注意有两个儿子和兄弟:
since 'dcast' can't figure out how to resolve this, it ends up creating another row
我有一个数据框 (df),格式如下 long/tall
输入:
ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1004 son
A2 2005 husband
A2 2006 son
我希望这是宽幅面的,我做了以下操作
因为 Reshape 无法处理重复项(默认为计数)我添加了一个虚拟列
df$dummy <- seq_len(now(df))
df_wide <- dcast(df, dummy + ID ~ type, value.var="entity_id")
这是我得到的:
dummy ID husband wife brother son
1 A1 1001 NA NA NA
2 A1 NA 1002 NA NA
3 A1 NA NA 1003 NA
我想要的:
dummy ID husband wife brother son
1 A1 1001 1002 1003 1004
2 A2 2005 NA NA 2006
EDIT1 SessionINFO()
R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidyr_0.4.1 reshape2_1.4.1 dplyr_0.4.3 RMySQL_0.10.8 DBI_0.3.1
loaded via a namespace (and not attached):
[1] plyr_1.8.3 magrittr_1.5 R6_2.1.2 assertthat_0.1 parallel_3.2.4 tools_3.2.4 Rcpp_0.12.4 stringi_1.0-1 stringr_1.0.0
我不确定我是否完全理解您添加虚拟列的原因(我假设您打算为它写 df$dummy
而不是 df_dummy
)。但以下似乎给出了您正在寻找的结果:
library(reshape2)
df <- read.delim(text="ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1004 son
A2 2005 husband
A2 2006 son", sep="")
dcast(df, ID ~ type, value.var="entity_id")
ID brother husband son wife
1 A1 1003 1001 1004 1002
2 A2 NA 2005 2006 NA
编辑:根据您修改后的数据,其中有多个兄弟和儿子,我建议如下(---假设您仍然希望所有内容都排成一排---):
解决方案 1:将所有内容放入一个单元格中:
df <- read.delim(text="ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1005 brother
A1 1004 son
A1 1006 son
A2 2005 husband
A2 2006 son", sep="")
dcast(df, ID ~ type, value.var="entity_id",
fun.aggregate = function(...) paste0(..., collapse = "_"))
ID brother husband son wife
1 A1 1003_1005 1001 1004_1006 1002
2 A2 2005 2006
在这里,我通过将它们的 ID 粘贴在一起来聚合多个实例。我不知道你以后想用这个做什么,所以我不知道这对你是否有用table。我只想指出一种可能性。不用说,您可以更改聚合函数以满足您的需要。例如,您可以将它们放入一个列表中,而不是将它们粘贴在一起。
dcast(df, ID ~ type, value.var="entity_id", fun.aggregate = list)
ID brother husband son wife
1 A1 1003, 1005 1001 1004, 1006 1002
2 A2 2005 2006
方案二:添加列:
library(dplyr)
new.df <- df %>% group_by(ID, type) %>%
mutate(type_num = paste(type, 1:n(), sep="_"))
dcast(new.df, ID ~ type_num, value.var="entity_id")
ID brother_1 brother_2 husband_1 son_1 son_2 wife_1
1 A1 1003 1005 1001 1004 1006 1002
2 A2 NA NA 2005 2006 NA NA
我的一个重大疏忽,但为了将来像我这样的人的利益。
仅当同一类型有多个条目时才会出现上述问题,即在上面的示例中,我的实际数据看起来像这样
ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1005 brother
A1 1004 son
A1 1006 son
A2 2005 husband
A2 2006 son
注意有两个儿子和兄弟:
since 'dcast' can't figure out how to resolve this, it ends up creating another row