如何在 R 中创建新数据时转置列?
How to transpose columns whilst creating new data in R?
我有一个如下所示的基因数据集:
Pathway Gene
Pathway1 Gene1
Pathway1 Gene2
Pathway2 Gene3
Pathway2 Gene1
Pathway3 Gene1
Pathway3 Gene4
Pathway3 Gene5
我希望将 Pathways
行转置为列,同时使用 1 和 0 跟踪哪些基因存在于哪个通路中。创建这样的输出:
Gene Pathway1 Pathway2 Pathway3
Gene1 1 1 1
Gene2 1 0 0
Gene3 0 1 0
Gene4 0 0 1
Gene5 0 0 0
我的真实数据大约有 3000 行长,我对 R 没有信心所以我一直在尝试使用 t() 但我不确定从哪里开始编码以获得二进制计数我我正在寻找 - 任何有关尝试功能的帮助或建议都会有所帮助。
输入示例数据:
structure(list(Pathway = c("Pathway1", "Pathway1", "Pathway2",
"Pathway2", "Pathway3", "Pathway3", "Pathway3"), Gene = c("Gene1",
"Gene2", "Gene3", "Gene1", "Gene1", "Gene4", "Gene5")), row.names = c(NA,
-7L), class = c("data.table", "data.frame"))
快速而肮脏的tidyverse
解决方案:
library(tidyr)
# edit thanks to @Ronak Shah
df %>%
pivot_wider(names_from = Pathway,
values_from = Pathway,
values_fn = length, values_fill = 0)
# A tibble: 5 x 4
Gene Pathway1 Pathway2 Pathway3
<chr> <dbl> <dbl> <dbl>
1 Gene1 1 1 1
2 Gene2 1 0 0
3 Gene3 0 1 0
4 Gene4 0 0 1
5 Gene5 0 0 1
data.table
接近
library(data.table)
dcast(setDT(mydata), Gene ~ Pathway, value.var = "Pathway", fun.aggregate = length)
# Gene Pathway1 Pathway2 Pathway3
# 1: Gene1 1 1 1
# 2: Gene2 1 0 0
# 3: Gene3 0 1 0
# 4: Gene4 0 0 1
# 5: Gene5 0 0 1
您可以使用 janitor::tabyl
.
janitor::tabyl(df, Gene, Pathway)
# Gene Pathway1 Pathway2 Pathway3
# Gene1 1 1 1
# Gene2 1 0 0
# Gene3 0 1 0
# Gene4 0 0 1
# Gene5 0 0 1
我有一个如下所示的基因数据集:
Pathway Gene
Pathway1 Gene1
Pathway1 Gene2
Pathway2 Gene3
Pathway2 Gene1
Pathway3 Gene1
Pathway3 Gene4
Pathway3 Gene5
我希望将 Pathways
行转置为列,同时使用 1 和 0 跟踪哪些基因存在于哪个通路中。创建这样的输出:
Gene Pathway1 Pathway2 Pathway3
Gene1 1 1 1
Gene2 1 0 0
Gene3 0 1 0
Gene4 0 0 1
Gene5 0 0 0
我的真实数据大约有 3000 行长,我对 R 没有信心所以我一直在尝试使用 t() 但我不确定从哪里开始编码以获得二进制计数我我正在寻找 - 任何有关尝试功能的帮助或建议都会有所帮助。
输入示例数据:
structure(list(Pathway = c("Pathway1", "Pathway1", "Pathway2",
"Pathway2", "Pathway3", "Pathway3", "Pathway3"), Gene = c("Gene1",
"Gene2", "Gene3", "Gene1", "Gene1", "Gene4", "Gene5")), row.names = c(NA,
-7L), class = c("data.table", "data.frame"))
快速而肮脏的tidyverse
解决方案:
library(tidyr)
# edit thanks to @Ronak Shah
df %>%
pivot_wider(names_from = Pathway,
values_from = Pathway,
values_fn = length, values_fill = 0)
# A tibble: 5 x 4
Gene Pathway1 Pathway2 Pathway3
<chr> <dbl> <dbl> <dbl>
1 Gene1 1 1 1
2 Gene2 1 0 0
3 Gene3 0 1 0
4 Gene4 0 0 1
5 Gene5 0 0 1
data.table
接近
library(data.table)
dcast(setDT(mydata), Gene ~ Pathway, value.var = "Pathway", fun.aggregate = length)
# Gene Pathway1 Pathway2 Pathway3
# 1: Gene1 1 1 1
# 2: Gene2 1 0 0
# 3: Gene3 0 1 0
# 4: Gene4 0 0 1
# 5: Gene5 0 0 1
您可以使用 janitor::tabyl
.
janitor::tabyl(df, Gene, Pathway)
# Gene Pathway1 Pathway2 Pathway3
# Gene1 1 1 1
# Gene2 1 0 0
# Gene3 0 1 0
# Gene4 0 0 1
# Gene5 0 0 1