基于R中的一列汇总多列
Summarizing multiple columns based on one column in R
我有一个看起来像这样的数据框:
TCGA_Name
Full_Name
Gene.Name
Thyroid Carcinoma
Papillary Thyroid Cancer
NRAS
Thyroid Carcinoma
Thyroid Gland Carcinoma
NRAS
Sarcoma
Uterine leiomyosarcoma
PIK3CA
Sarcoma
Sarcoma
PIK3CA
Ovarian Serous Cystadenocarcinoma
High Grade Serous Ovarian Cancer
PIK3CA
我正在尝试根据 TCGA_Name 减少行数。我想 Full_Name 癌症类型,如果它们具有相同的 TCGA 标题并共享它们 gene.name。最终产品应如下所示:
TCGA_Name
Full_Name
Gene.Name
Thyroid Carcinoma
Papillary Thyroid Cancer, Thyroid Gland Carcinoma
NRAS
Sarcoma
Uterine leiomyosarcoma, Sarcoma
PIK3CA
Ovarian Serous Cystadenocarcinoma
High Grade Serous Ovarian Cancer
PIK3CA
到目前为止我已经做到了:
library(plyr)
df1 <- ddply(df1, .(TCGA_Name), summarize, text=paste(Hotspot_Name, collapse=", "))```
但这会删除 Gene.Name
列
一如既往,非常感谢您的帮助!
是你想要的吗?
df1 <- ddply(df1, .(TCGA_Name,Gene.Name), summarize, text=paste(Full_Name, collapse=", "))
只需添加'Gene.Name'
我有一个看起来像这样的数据框:
TCGA_Name | Full_Name | Gene.Name |
---|---|---|
Thyroid Carcinoma | Papillary Thyroid Cancer | NRAS |
Thyroid Carcinoma | Thyroid Gland Carcinoma | NRAS |
Sarcoma | Uterine leiomyosarcoma | PIK3CA |
Sarcoma | Sarcoma | PIK3CA |
Ovarian Serous Cystadenocarcinoma | High Grade Serous Ovarian Cancer | PIK3CA |
我正在尝试根据 TCGA_Name 减少行数。我想 Full_Name 癌症类型,如果它们具有相同的 TCGA 标题并共享它们 gene.name。最终产品应如下所示:
TCGA_Name | Full_Name | Gene.Name |
---|---|---|
Thyroid Carcinoma | Papillary Thyroid Cancer, Thyroid Gland Carcinoma | NRAS |
Sarcoma | Uterine leiomyosarcoma, Sarcoma | PIK3CA |
Ovarian Serous Cystadenocarcinoma | High Grade Serous Ovarian Cancer | PIK3CA |
到目前为止我已经做到了:
library(plyr)
df1 <- ddply(df1, .(TCGA_Name), summarize, text=paste(Hotspot_Name, collapse=", "))```
但这会删除 Gene.Name
列
一如既往,非常感谢您的帮助!
是你想要的吗?
df1 <- ddply(df1, .(TCGA_Name,Gene.Name), summarize, text=paste(Full_Name, collapse=", "))
只需添加'Gene.Name'