根据 R 中的三列计算出现次数
count occurence based on three columns in R
我有一个数据框
chromosome_4 26869907 26895858 . 0.93 + mRNA ID=g2108.t2;Parent=g2108; chromosome_4 26887040 26887102 . 16.4 + Target Motif:Helitron-1_Dvir-Dmon-C 5 65 62 g2108
chromosome_4 26869907 26895858 . 0.93 + intron ID=g2108.t2;Parent=g2108; chromosome_4 26887363 26887481 . 17 + Target Motif:Helitron-1_Dmon 20 83 118 g2108
chromosome_4 26869907 26895858 . 0.93 + exon ID=g2108.t2;Parent=g2108; chromosome_4 26887528 26887618 . 27.6 + Target Motif:Helitron-11_Dmon 507 594 90 g2108
chromosome_4 26869907 26895858 . 0.93 + gene ID=g2108.t2;Parent=g2108; chromosome_4 26887618 26887648 . 18.9 - Target Motif:Helitron-2_Dvir-Dmon-C 21 51 30 g2108
chromosome_8 8522721 8540839 . 0.93 + intron ID=g2108.t2;Parent=g2108; chromosome_8 8522721 8540839 . 16.4 + Target Motif:Helitron-1_Dvir-Dmon-C 5 65 62 g2108
chromosome_8 8522721 8540839 . 0.93 + mRNA ID=g2108.t2;Parent=g2108; chromosome_8 8522721 8540839 . 17 + Target Motif:Helitron-1_Dmon 20 83 118 g608
chromosome_8 8522721 8540839 . 0.93 + intron ID=g2108.t2;Parent=g2108; chromosome_8 8522721 8540839 . 27.6 + Target Motif:Helitron-11_Dmon 507 594 90 g608
chromosome_8 8522721 8540839 . 0.93 + gene ID=g2108.t2;Parent=g2108; chromosome_8 8522721 8540839 . 18.9 - Target Motif:Helitron-2_Dvir-Dmon-C 21 51 30 g608
我想根据每个独特基因 (col20) 的染色体数 (col1) 和基因部分 (col7) 来计算基序 (col16)。
我试过这个
gene1 %>% filter(V1,V7,V20) %>% select(V16) %>% table
错误是
Error in `filter()`:
! Problem while computing `..1 = V1`.
x Input `..1` must be a logical vector, not a character.
Run `rlang::last_error()` to see where the error occurred.
最后我需要能够绑定属于不同人群的几个这样的表:bind_rows(gene1, gene2, gene3)
我不是 100% 确定你想要的输出应该是什么,但如果你想计算给定染色体、部分和基因在 v16
中的独特基序的数量,你可以使用group_by()
和 summarize()
在 dplyr
包中:
数据
df <- data.frame(V1 = rep(paste0("chromasome_", c(4,8)), each = 4),
V7 = c("mRNA", "intron","exon","gene"),
V16 = c("Motif:Helitron-1_Dvir-Dmon-C",
"Motif:Helitron-1_Dmon",
"Motif:Helitron-11_Dmon",
"Motif:Helitron-2_Dvir-Dmon-C",
"Motif:Helitron-2_Dvir-Dmon-C",
"Motif:Helitron-1_Dmon",
"Motif:Helitron-11_Dmon",
"Motif:Helitron-2_Dvir-Dmon-C"),
V20 = c(rep(c("g2108","g608"), each = 4)))
代码
df %>%
group_by(V1, V7, V20) %>%
summarize(unique_motifs = n_distinct(V16))
输出:
# V1 V7 V20 unique_motifs
# 1 chromasome_4 exon g2108 1
# 2 chromasome_4 gene g2108 1
# 3 chromasome_4 intron g2108 1
# 4 chromasome_4 mRNA g2108 1
# 5 chromasome_8 exon g608 1
# 6 chromasome_8 gene g608 1
# 7 chromasome_8 intron g608 1
# 8 chromasome_8 mRNA g608 1
我有一个数据框
chromosome_4 26869907 26895858 . 0.93 + mRNA ID=g2108.t2;Parent=g2108; chromosome_4 26887040 26887102 . 16.4 + Target Motif:Helitron-1_Dvir-Dmon-C 5 65 62 g2108
chromosome_4 26869907 26895858 . 0.93 + intron ID=g2108.t2;Parent=g2108; chromosome_4 26887363 26887481 . 17 + Target Motif:Helitron-1_Dmon 20 83 118 g2108
chromosome_4 26869907 26895858 . 0.93 + exon ID=g2108.t2;Parent=g2108; chromosome_4 26887528 26887618 . 27.6 + Target Motif:Helitron-11_Dmon 507 594 90 g2108
chromosome_4 26869907 26895858 . 0.93 + gene ID=g2108.t2;Parent=g2108; chromosome_4 26887618 26887648 . 18.9 - Target Motif:Helitron-2_Dvir-Dmon-C 21 51 30 g2108
chromosome_8 8522721 8540839 . 0.93 + intron ID=g2108.t2;Parent=g2108; chromosome_8 8522721 8540839 . 16.4 + Target Motif:Helitron-1_Dvir-Dmon-C 5 65 62 g2108
chromosome_8 8522721 8540839 . 0.93 + mRNA ID=g2108.t2;Parent=g2108; chromosome_8 8522721 8540839 . 17 + Target Motif:Helitron-1_Dmon 20 83 118 g608
chromosome_8 8522721 8540839 . 0.93 + intron ID=g2108.t2;Parent=g2108; chromosome_8 8522721 8540839 . 27.6 + Target Motif:Helitron-11_Dmon 507 594 90 g608
chromosome_8 8522721 8540839 . 0.93 + gene ID=g2108.t2;Parent=g2108; chromosome_8 8522721 8540839 . 18.9 - Target Motif:Helitron-2_Dvir-Dmon-C 21 51 30 g608
我想根据每个独特基因 (col20) 的染色体数 (col1) 和基因部分 (col7) 来计算基序 (col16)。 我试过这个
gene1 %>% filter(V1,V7,V20) %>% select(V16) %>% table
错误是
Error in `filter()`:
! Problem while computing `..1 = V1`.
x Input `..1` must be a logical vector, not a character.
Run `rlang::last_error()` to see where the error occurred.
最后我需要能够绑定属于不同人群的几个这样的表:bind_rows(gene1, gene2, gene3)
我不是 100% 确定你想要的输出应该是什么,但如果你想计算给定染色体、部分和基因在 v16
中的独特基序的数量,你可以使用group_by()
和 summarize()
在 dplyr
包中:
数据
df <- data.frame(V1 = rep(paste0("chromasome_", c(4,8)), each = 4),
V7 = c("mRNA", "intron","exon","gene"),
V16 = c("Motif:Helitron-1_Dvir-Dmon-C",
"Motif:Helitron-1_Dmon",
"Motif:Helitron-11_Dmon",
"Motif:Helitron-2_Dvir-Dmon-C",
"Motif:Helitron-2_Dvir-Dmon-C",
"Motif:Helitron-1_Dmon",
"Motif:Helitron-11_Dmon",
"Motif:Helitron-2_Dvir-Dmon-C"),
V20 = c(rep(c("g2108","g608"), each = 4)))
代码
df %>%
group_by(V1, V7, V20) %>%
summarize(unique_motifs = n_distinct(V16))
输出:
# V1 V7 V20 unique_motifs
# 1 chromasome_4 exon g2108 1
# 2 chromasome_4 gene g2108 1
# 3 chromasome_4 intron g2108 1
# 4 chromasome_4 mRNA g2108 1
# 5 chromasome_8 exon g608 1
# 6 chromasome_8 gene g608 1
# 7 chromasome_8 intron g608 1
# 8 chromasome_8 mRNA g608 1