R 中的 tidyr 包,使用 gather() "Invalid column specification"
tidyr package in R, using gather() "Invalid column specification"
我仍在学习如何使用 tidyr。我想使用 "gather()" 将列分成多行,并通过在适用的地方复制它来保留 "gene_ID" 列。
输入数据示例:
gene_ID path1 path2 path3 path4 path5 path6 path7 path8
CAMNT_0043146643 RNA transport
CAMNT_0029561721 Ribosome
CAMNT_0024703307 Sphingolipid signaling pathway Lysosome
CAMNT_0020981363 mRNA surveillance pathway Hippo signaling pathway cAMP signaling pathway cGMP - PKG signaling pathway Regulation of actin cytoskeleton Meiosis - yeast Oocyte meiosis Focal adhesion
CAMNT_0020021387 Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway Endocytosis
CAMNT_0003293445 Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway Endocytosis
所需输出数据示例:
gene_ID Pathway
CAMNT_0043146643 RNA transport
CAMNT_0029561721 Ribosome
CAMNT_0024703307 Lysosome
CAMNT_0024703307 Sphingolipid signaling pathway
CAMNT_0020981363 mRNA surveillance pathway
CAMNT_0020981363 Hippo signaling pathway
CAMNT_0020981363 cAMP signaling pathway
CAMNT_0020981363 cGMP - PKG signaling pathway
CAMNT_0020981363 Regulation of actin cytoskeleton
CAMNT_0020981363 Meiosis - yeast
CAMNT_0020981363 Oocyte meiosis
CAMNT_0020981363 Focal adhesion
CAMNT_0020021387 Spliceosome
CAMNT_0020021387 Protein processing in endoplasmic reticulum
CAMNT_0020021387 MAPK signaling pathway
CAMNT_0020021387 Endocytosis
CAMNT_0003293445 Spliceosome
CAMNT_0003293445 Protein processing in endoplasmic reticulum
CAMNT_0003293445 MAPK signaling pathway
CAMNT_0003293445 Endocytosis
目前,我正在尝试做:
temp<-gather(extract,"gene_ID",path1:path8)
但我收到一条错误消息:"Error: Invalid column specification"
对于我的输入 df,我已经尝试过使用 headers 和不使用 headers,但是同样的错误发生了。我愿意使用替代方法,但我遇到了 "NAs" 问题,因为并非所有行 "gene_IDs" 都具有相同的列数。
关于如何进行的建议?
df <- data.frame(x = c("a", "b", "c","d","e"),
path1=c("test1","test1","test2","test2","test3"),
path2=c("testa","","testg","testd",""))
library(reshape2)
df[df==""] <- NA
melt(df, id.vars="x", na.rm=T)
# x variable value
# 1 a path1 test1
# 2 b path1 test1
# 3 c path1 test2
# 4 d path1 test2
# 5 e path1 test3
# 6 a path2 testa
# 8 c path2 testg
# 9 d path2 testd
这是一个tidyr
解决方案:
df %>%
gather(path, Pathway, path1, path2) %>%
filter(Pathway != "") %>%
select(-path)
x Pathway
1 a test1
2 b test1
3 c test2
4 d test2
5 e test3
6 a testa
7 c testg
8 d testd
我仍在学习如何使用 tidyr。我想使用 "gather()" 将列分成多行,并通过在适用的地方复制它来保留 "gene_ID" 列。 输入数据示例:
gene_ID path1 path2 path3 path4 path5 path6 path7 path8
CAMNT_0043146643 RNA transport
CAMNT_0029561721 Ribosome
CAMNT_0024703307 Sphingolipid signaling pathway Lysosome
CAMNT_0020981363 mRNA surveillance pathway Hippo signaling pathway cAMP signaling pathway cGMP - PKG signaling pathway Regulation of actin cytoskeleton Meiosis - yeast Oocyte meiosis Focal adhesion
CAMNT_0020021387 Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway Endocytosis
CAMNT_0003293445 Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway Endocytosis
所需输出数据示例:
gene_ID Pathway
CAMNT_0043146643 RNA transport
CAMNT_0029561721 Ribosome
CAMNT_0024703307 Lysosome
CAMNT_0024703307 Sphingolipid signaling pathway
CAMNT_0020981363 mRNA surveillance pathway
CAMNT_0020981363 Hippo signaling pathway
CAMNT_0020981363 cAMP signaling pathway
CAMNT_0020981363 cGMP - PKG signaling pathway
CAMNT_0020981363 Regulation of actin cytoskeleton
CAMNT_0020981363 Meiosis - yeast
CAMNT_0020981363 Oocyte meiosis
CAMNT_0020981363 Focal adhesion
CAMNT_0020021387 Spliceosome
CAMNT_0020021387 Protein processing in endoplasmic reticulum
CAMNT_0020021387 MAPK signaling pathway
CAMNT_0020021387 Endocytosis
CAMNT_0003293445 Spliceosome
CAMNT_0003293445 Protein processing in endoplasmic reticulum
CAMNT_0003293445 MAPK signaling pathway
CAMNT_0003293445 Endocytosis
目前,我正在尝试做:
temp<-gather(extract,"gene_ID",path1:path8)
但我收到一条错误消息:"Error: Invalid column specification" 对于我的输入 df,我已经尝试过使用 headers 和不使用 headers,但是同样的错误发生了。我愿意使用替代方法,但我遇到了 "NAs" 问题,因为并非所有行 "gene_IDs" 都具有相同的列数。
关于如何进行的建议?
df <- data.frame(x = c("a", "b", "c","d","e"),
path1=c("test1","test1","test2","test2","test3"),
path2=c("testa","","testg","testd",""))
library(reshape2)
df[df==""] <- NA
melt(df, id.vars="x", na.rm=T)
# x variable value
# 1 a path1 test1
# 2 b path1 test1
# 3 c path1 test2
# 4 d path1 test2
# 5 e path1 test3
# 6 a path2 testa
# 8 c path2 testg
# 9 d path2 testd
这是一个tidyr
解决方案:
df %>%
gather(path, Pathway, path1, path2) %>%
filter(Pathway != "") %>%
select(-path)
x Pathway
1 a test1
2 b test1
3 c test2
4 d test2
5 e test3
6 a testa
7 c testg
8 d testd