将两个表汇总为R中的新表
Summarize two tables into new one in R
我有这两个不同的 tables :
#table1 all patients that have RNASeq data
experimental.strategy | submitter.id
RNA-Seq | TCGA-AA-3867
RNA-Seq | TCGA-F4-6809
RNA-Seq | TCGA-AA-3562
...
#table2 所有具有 miRNAseq 数据的患者
experimental.strategy | submitter.id
miRNA-Seq | TCGA-A6-6650
miRNA-Seq | TCGA-AZ-4308
miRNA-Seq | TCGA-AA-A02Y
...
有些患者只有一种数据可用,所以我有三种患者:只有 RNA-Seq 数据,只有 miRNA-Seq 数据,以及同时有 miRNA-Seq 和 RNA 的患者-序列数据可用。
我想创建一个新的 table,其中包含所有患者 ID 汇总这些 table 的数据,如下所示:
submitter.id | miRNA-Seq | RNA-Seq | Paired
TCGA-4T-AA8H | 0 | 1 | 0
TCGA-5M-AAT5 | 1 | 1 | 1
TCGA-3L-AA1B | 1 | 0 | 0
TCGA-AA-A02Y | 0 | 1 | 0
我该怎么做?
当您可以从 Table 1 和 Table 2 中获得所有必需的信息时,您需要第三个 table 做什么?使用 full_join
到 merge
Table 1
和 Table 2
并得到如下所示的所需结果
## Input Data
df1 <- read.table(text = "experimental.strategy submitter.id
RNA-Seq TCGA-AA-3867
RNA-Seq TCGA-F4-6809
RNA-Seq TCGA-AA-3562", header = TRUE)
df2 <- read.table(text = "experimental.strategy submitter.id
miRNA-Seq TCGA-AA-3867
miRNA-Seq TCGA-F4-6809
miRNA-Seq TCGA-AA-A02Y", header = TRUE)
df1 <- df1 %>% rename(RNA_Seq = experimental.strategy) %>%
mutate(RNA_Seq = str_replace(RNA_Seq, "RNA-Seq","1")) %>%
mutate(RNA_Seq = as.numeric(RNA_Seq))
df2 <- df2 %>% rename(miRNA_Seq = experimental.strategy) %>%
mutate(miRNA_Seq = str_replace(miRNA_Seq, "miRNA-Seq","1")) %>%
mutate(miRNA_Seq = as.numeric(miRNA_Seq))
df1 %>% full_join(df2, by = ("submitter.id")) %>%
mutate_if(is.numeric,coalesce,0) %>% group_by(submitter.id) %>%
mutate(Paired = if_else((RNA_Seq == 1 & miRNA_Seq == 1), 1, 0))
## Ouput
RNA_Seq submitter.id miRNA_Seq Paired
<dbl> <chr> <dbl> <dbl>
1 1 TCGA-AA-3867 1 1
2 1 TCGA-F4-6809 1 1
3 1 TCGA-AA-3562 0 0
4 0 TCGA-AA-A02Y 1 0
我有这两个不同的 tables :
#table1 all patients that have RNASeq data
experimental.strategy | submitter.id
RNA-Seq | TCGA-AA-3867
RNA-Seq | TCGA-F4-6809
RNA-Seq | TCGA-AA-3562
...
#table2 所有具有 miRNAseq 数据的患者
experimental.strategy | submitter.id
miRNA-Seq | TCGA-A6-6650
miRNA-Seq | TCGA-AZ-4308
miRNA-Seq | TCGA-AA-A02Y
...
有些患者只有一种数据可用,所以我有三种患者:只有 RNA-Seq 数据,只有 miRNA-Seq 数据,以及同时有 miRNA-Seq 和 RNA 的患者-序列数据可用。
我想创建一个新的 table,其中包含所有患者 ID 汇总这些 table 的数据,如下所示:
submitter.id | miRNA-Seq | RNA-Seq | Paired
TCGA-4T-AA8H | 0 | 1 | 0
TCGA-5M-AAT5 | 1 | 1 | 1
TCGA-3L-AA1B | 1 | 0 | 0
TCGA-AA-A02Y | 0 | 1 | 0
我该怎么做?
当您可以从 Table 1 和 Table 2 中获得所有必需的信息时,您需要第三个 table 做什么?使用 full_join
到 merge
Table 1
和 Table 2
并得到如下所示的所需结果
## Input Data
df1 <- read.table(text = "experimental.strategy submitter.id
RNA-Seq TCGA-AA-3867
RNA-Seq TCGA-F4-6809
RNA-Seq TCGA-AA-3562", header = TRUE)
df2 <- read.table(text = "experimental.strategy submitter.id
miRNA-Seq TCGA-AA-3867
miRNA-Seq TCGA-F4-6809
miRNA-Seq TCGA-AA-A02Y", header = TRUE)
df1 <- df1 %>% rename(RNA_Seq = experimental.strategy) %>%
mutate(RNA_Seq = str_replace(RNA_Seq, "RNA-Seq","1")) %>%
mutate(RNA_Seq = as.numeric(RNA_Seq))
df2 <- df2 %>% rename(miRNA_Seq = experimental.strategy) %>%
mutate(miRNA_Seq = str_replace(miRNA_Seq, "miRNA-Seq","1")) %>%
mutate(miRNA_Seq = as.numeric(miRNA_Seq))
df1 %>% full_join(df2, by = ("submitter.id")) %>%
mutate_if(is.numeric,coalesce,0) %>% group_by(submitter.id) %>%
mutate(Paired = if_else((RNA_Seq == 1 & miRNA_Seq == 1), 1, 0))
## Ouput
RNA_Seq submitter.id miRNA_Seq Paired
<dbl> <chr> <dbl> <dbl>
1 1 TCGA-AA-3867 1 1
2 1 TCGA-F4-6809 1 1
3 1 TCGA-AA-3562 0 0
4 0 TCGA-AA-A02Y 1 0