根据 R 中的三列计算出现次数

Question

我有一个数据框

chromosome_4    26869907    26895858    .   0.93    +   mRNA    ID=g2108.t2;Parent=g2108;   chromosome_4    26887040    26887102    .   16.4    +   Target  Motif:Helitron-1_Dvir-Dmon-C    5   65  62  g2108
chromosome_4    26869907    26895858    .   0.93    +   intron  ID=g2108.t2;Parent=g2108;   chromosome_4    26887363    26887481    .   17      +   Target  Motif:Helitron-1_Dmon   20  83  118 g2108
chromosome_4    26869907    26895858    .   0.93    +   exon    ID=g2108.t2;Parent=g2108;   chromosome_4    26887528    26887618    .   27.6    +   Target  Motif:Helitron-11_Dmon  507 594 90  g2108
chromosome_4    26869907    26895858    .   0.93    +   gene    ID=g2108.t2;Parent=g2108;   chromosome_4    26887618    26887648    .   18.9    -   Target  Motif:Helitron-2_Dvir-Dmon-C    21  51  30  g2108
chromosome_8    8522721     8540839     .   0.93    +   intron  ID=g2108.t2;Parent=g2108;   chromosome_8    8522721     8540839     .   16.4    +   Target  Motif:Helitron-1_Dvir-Dmon-C    5   65  62  g2108
chromosome_8    8522721     8540839     .   0.93    +   mRNA    ID=g2108.t2;Parent=g2108;   chromosome_8    8522721     8540839     .   17      +   Target  Motif:Helitron-1_Dmon   20  83  118 g608
chromosome_8    8522721     8540839     .   0.93    +   intron  ID=g2108.t2;Parent=g2108;   chromosome_8    8522721     8540839     .   27.6    +   Target  Motif:Helitron-11_Dmon  507 594 90       g608
chromosome_8    8522721     8540839     .   0.93    +   gene    ID=g2108.t2;Parent=g2108;   chromosome_8    8522721     8540839     .   18.9    -   Target  Motif:Helitron-2_Dvir-Dmon-C    21  51  30  g608

我想根据每个独特基因 (col20) 的染色体数 (col1) 和基因部分 (col7) 来计算基序 (col16)。我试过这个

gene1 %>%  filter(V1,V7,V20) %>% select(V16) %>% table

错误是

Error in `filter()`:

! Problem while computing `..1 = V1`.
x Input `..1` must be a logical vector, not a character.
Run `rlang::last_error()` to see where the error occurred.

最后我需要能够绑定属于不同人群的几个这样的表：bind_rows(gene1, gene2, gene3)

Answer 1

我不是 100% 确定你想要的输出应该是什么，但如果你想计算给定染色体、部分和基因在 v16 中的独特基序的数量，你可以使用group_by() 和 summarize() 在 dplyr 包中：

数据

df <- data.frame(V1 = rep(paste0("chromasome_", c(4,8)), each = 4),
                 V7 = c("mRNA", "intron","exon","gene"),
                 V16 = c("Motif:Helitron-1_Dvir-Dmon-C",
                         "Motif:Helitron-1_Dmon",
                         "Motif:Helitron-11_Dmon",
                         "Motif:Helitron-2_Dvir-Dmon-C",
                         "Motif:Helitron-2_Dvir-Dmon-C",
                         "Motif:Helitron-1_Dmon",
                         "Motif:Helitron-11_Dmon",
                         "Motif:Helitron-2_Dvir-Dmon-C"),
                 V20 = c(rep(c("g2108","g608"), each = 4)))

代码

df %>% 
  group_by(V1, V7, V20) %>% 
  summarize(unique_motifs = n_distinct(V16))

输出：

#   V1           V7     V20   unique_motifs
# 1 chromasome_4 exon   g2108             1
# 2 chromasome_4 gene   g2108             1
# 3 chromasome_4 intron g2108             1
# 4 chromasome_4 mRNA   g2108             1
# 5 chromasome_8 exon   g608              1
# 6 chromasome_8 gene   g608              1
# 7 chromasome_8 intron g608              1
# 8 chromasome_8 mRNA   g608              1

根据 R 中的三列计算出现次数

count occurence based on three columns in R

r

filter