如果数据在 CSV 文件中提供,则在 Perl 或 R 中创建矩阵

create a matrix in Perl or R if data is provided in CSV file

这是一个 CSV 文件cn0_gene.csv:-

  X Sample_Name           Gene_Names Frequencey
 1   gw6.00033              NOT_FOUND         4
 102  gw6.0006       ACTA2,FAS,FAS-AS1        1
 103  gw6.0006           MMP26,OR51A2         1                  
 104  gw6.0006               NOT_FOUND        5
 105  gw6.0006     OR52N1,OR52N5,TRIM5        1

如果数据在 CSV 文件中,我如何创建矩阵?

预期输出:- 唯一的 sample_name 作为行,Gene_Name 作为列,频率作为对应于 sample_nameGene_names 的数据。

library( data.table )

dt <- fread("./cn0_gene.csv")
dcast( dt, Sample_Name ~ Gene_Names, value.var = "Frequencey" )

#    Sample_Name ACTA2,FAS,FAS-AS1 MMP26,OR51A2 NOT_FOUND OR52N1,OR52N5,TRIM5 RHD,RSRP1 RNLS SCAPER TP63 WWOX
# 1:   gw6.00033                NA           NA         4                  NA        NA   NA     NA   NA   NA
# 2:    gw6.0006                 1            1         5                   1         1    1      1    1    1

要将 NA 填入零,请使用:

dcast( dt, Sample_Name ~Gene_Names, value.var = "Frequencey", fill = 0 )

#    Sample_Name ACTA2,FAS,FAS-AS1 MMP26,OR51A2 NOT_FOUND OR52N1,OR52N5,TRIM5 RHD,RSRP1 RNLS SCAPER TP63 WWOX
# 1:   gw6.00033                 0            0         4                   0         0    0      0    0    0
# 2:    gw6.0006                 1            1         5                   1         1    1      1    1    1

您也可以使用基础 R。

tt <- read.csv("cn0_gene.csv", header=TRUE, sep="", row.names=1)
reshape(tt, idvar=c("Sample_Name", "Copy_No"), timevar = "Gene_Names", direction="wide")

#     Sample_Name Copy_No Frequencey.NOT_FOUND Frequencey.ACTA2,FAS,FAS-AS1
# 1     gw6.00033    cn=0                    4                           NA
# 102    gw6.0006    cn=0                    5                            1
#     Frequencey.MMP26,OR51A2 Frequencey.OR52N1,OR52N5,TRIM5 Frequencey.RHD,RSRP1
# 1                        NA                             NA                   NA
# 102                       1                              1                    1
#     Frequencey.RNLS Frequencey.SCAPER Frequencey.TP63 Frequencey.WWOX
# 1                NA                NA              NA              NA
# 102               1                 1               1               1