基于R中的一列汇总多列

Summarizing multiple columns based on one column in R

我有一个看起来像这样的数据框:

TCGA_Name Full_Name Gene.Name
Thyroid Carcinoma Papillary Thyroid Cancer NRAS
Thyroid Carcinoma Thyroid Gland Carcinoma NRAS
Sarcoma Uterine leiomyosarcoma PIK3CA
Sarcoma Sarcoma PIK3CA
Ovarian Serous Cystadenocarcinoma High Grade Serous Ovarian Cancer PIK3CA

我正在尝试根据 TCGA_Name 减少行数。我想 Full_Name 癌症类型,如果它们具有相同的 TCGA 标题并共享它们 gene.name。最终产品应如下所示:

TCGA_Name Full_Name Gene.Name
Thyroid Carcinoma Papillary Thyroid Cancer, Thyroid Gland Carcinoma NRAS
Sarcoma Uterine leiomyosarcoma, Sarcoma PIK3CA
Ovarian Serous Cystadenocarcinoma High Grade Serous Ovarian Cancer PIK3CA

到目前为止我已经做到了:

library(plyr) 
df1 <- ddply(df1, .(TCGA_Name), summarize, text=paste(Hotspot_Name, collapse=", "))```

但这会删除 Gene.Name

一如既往,非常感谢您的帮助!

是你想要的吗?

df1 <- ddply(df1, .(TCGA_Name,Gene.Name), summarize, text=paste(Full_Name, collapse=", "))

只需添加'Gene.Name'