R 中的 tidyr 包,使用 gather() "Invalid column specification"

tidyr package in R, using gather() "Invalid column specification"

我仍在学习如何使用 tidyr。我想使用 "gather()" 将列分成多行,并通过在适用的地方复制它来保留 "gene_ID" 列。 输入数据示例:

    gene_ID path1   path2   path3   path4   path5   path6   path7   path8
CAMNT_0043146643    RNA transport                           
CAMNT_0029561721    Ribosome                            
CAMNT_0024703307    Sphingolipid signaling pathway  Lysosome                        
CAMNT_0020981363    mRNA surveillance pathway   Hippo signaling pathway cAMP signaling pathway  cGMP - PKG signaling pathway    Regulation of actin cytoskeleton    Meiosis - yeast Oocyte meiosis  Focal adhesion
CAMNT_0020021387    Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway  Endocytosis             
CAMNT_0003293445    Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway  Endocytosis             

所需输出数据示例:

gene_ID Pathway
CAMNT_0043146643    RNA transport
CAMNT_0029561721    Ribosome
CAMNT_0024703307    Lysosome
CAMNT_0024703307    Sphingolipid signaling pathway
CAMNT_0020981363    mRNA surveillance pathway
CAMNT_0020981363    Hippo signaling pathway
CAMNT_0020981363    cAMP signaling pathway
CAMNT_0020981363    cGMP - PKG signaling pathway
CAMNT_0020981363    Regulation of actin cytoskeleton
CAMNT_0020981363    Meiosis - yeast
CAMNT_0020981363    Oocyte meiosis
CAMNT_0020981363    Focal adhesion
CAMNT_0020021387    Spliceosome
CAMNT_0020021387    Protein processing in endoplasmic reticulum
CAMNT_0020021387    MAPK signaling pathway
CAMNT_0020021387    Endocytosis
CAMNT_0003293445    Spliceosome
CAMNT_0003293445    Protein processing in endoplasmic reticulum
CAMNT_0003293445    MAPK signaling pathway
CAMNT_0003293445    Endocytosis

目前,我正在尝试做:

temp<-gather(extract,"gene_ID",path1:path8)

但我收到一条错误消息:"Error: Invalid column specification" 对于我的输入 df,我已经尝试过使用 headers 和不使用 headers,但是同样的错误发生了。我愿意使用替代方法,但我遇到了 "NAs" 问题,因为并非所有行 "gene_IDs" 都具有相同的列数。

关于如何进行的建议?

df <- data.frame(x = c("a", "b", "c","d","e"),
                 path1=c("test1","test1","test2","test2","test3"),
                 path2=c("testa","","testg","testd",""))
library(reshape2)
df[df==""] <- NA
melt(df, id.vars="x", na.rm=T)
#   x variable value
# 1 a    path1 test1
# 2 b    path1 test1
# 3 c    path1 test2
# 4 d    path1 test2
# 5 e    path1 test3
# 6 a    path2 testa
# 8 c    path2 testg
# 9 d    path2 testd

这是一个tidyr解决方案:

df %>%
  gather(path, Pathway, path1, path2) %>%
  filter(Pathway != "") %>%
  select(-path)

  x Pathway
1 a   test1
2 b   test1
3 c   test2
4 d   test2
5 e   test3
6 a   testa
7 c   testg
8 d   testd