通过将列中的字符拆分为单独的行来重塑数据框
reshape dataframe by splitting characters in a column in to separate rows
我如何通过将基因名称和 Enseble_Id 分成单独的列来重塑我的 data.frame
df1<-
ID chrom loc.start loc.end num.mark seg.mean Genes Gene.N Ensemble_ID
88410 1 3010000 173490000 8430 0.0039 Sntg1,Rrs1 SNT,ELF ENSMUSG00000025909,ENSMUSG00000061024,
88410 1 173510000 173590000 5 -1.77380 Ifi203,Mndal REK,MNDAL ENSMUSG00000049598,ENSMUSG00000026104
expected output
Gene.N Genes Ensemble_ID ID chrom loc.start loc.end num.mark seg.mean
SNT Sntg1 ENSMUSG00000025909 88410 1 3010000 173490000 8430 0.0039
ELF Rrs1 ENSMUSG00000061024 88410 1 3010000 173490000 8430 0.0039
REK Ifi203 ENSMUSG00000049598 88410 1 173510000 173590000 5 -1.77380
MNDAL Mndal ENSMUSG00000026104 88410 1 173510000 173590000 5 -1.77380
您可以使用我的 "splitstackshape" 包中的 cSplit
:
library(splitstackshape)
cSplit(df1, c("Genes", "Gene.N", "Ensemble_ID"), ",", "long")
# ID chrom loc.start loc.end num.mark seg.mean Genes Gene.N Ensemble_ID
# 1: 88410 1 3010000 173490000 8430 0.0039 Sntg1 SNT ENSMUSG00000025909
# 2: 88410 1 3010000 173490000 8430 0.0039 Rrs1 ELF ENSMUSG00000061024
# 3: 88410 1 173510000 173590000 5 -1.7738 Ifi203 REK ENSMUSG00000049598
# 4: 88410 1 173510000 173590000 5 -1.7738 Mndal MNDAL ENSMUSG00000026104
我如何通过将基因名称和 Enseble_Id 分成单独的列来重塑我的 data.frame
df1<-
ID chrom loc.start loc.end num.mark seg.mean Genes Gene.N Ensemble_ID
88410 1 3010000 173490000 8430 0.0039 Sntg1,Rrs1 SNT,ELF ENSMUSG00000025909,ENSMUSG00000061024,
88410 1 173510000 173590000 5 -1.77380 Ifi203,Mndal REK,MNDAL ENSMUSG00000049598,ENSMUSG00000026104
expected output
Gene.N Genes Ensemble_ID ID chrom loc.start loc.end num.mark seg.mean
SNT Sntg1 ENSMUSG00000025909 88410 1 3010000 173490000 8430 0.0039
ELF Rrs1 ENSMUSG00000061024 88410 1 3010000 173490000 8430 0.0039
REK Ifi203 ENSMUSG00000049598 88410 1 173510000 173590000 5 -1.77380
MNDAL Mndal ENSMUSG00000026104 88410 1 173510000 173590000 5 -1.77380
您可以使用我的 "splitstackshape" 包中的 cSplit
:
library(splitstackshape)
cSplit(df1, c("Genes", "Gene.N", "Ensemble_ID"), ",", "long")
# ID chrom loc.start loc.end num.mark seg.mean Genes Gene.N Ensemble_ID
# 1: 88410 1 3010000 173490000 8430 0.0039 Sntg1 SNT ENSMUSG00000025909
# 2: 88410 1 3010000 173490000 8430 0.0039 Rrs1 ELF ENSMUSG00000061024
# 3: 88410 1 173510000 173590000 5 -1.7738 Ifi203 REK ENSMUSG00000049598
# 4: 88410 1 173510000 173590000 5 -1.7738 Mndal MNDAL ENSMUSG00000026104