如果数据在 CSV 文件中提供,则在 Perl 或 R 中创建矩阵
create a matrix in Perl or R if data is provided in CSV file
这是一个 CSV 文件cn0_gene.csv:-
X Sample_Name Gene_Names Frequencey
1 gw6.00033 NOT_FOUND 4
102 gw6.0006 ACTA2,FAS,FAS-AS1 1
103 gw6.0006 MMP26,OR51A2 1
104 gw6.0006 NOT_FOUND 5
105 gw6.0006 OR52N1,OR52N5,TRIM5 1
如果数据在 CSV 文件中,我如何创建矩阵?
预期输出:- 唯一的 sample_name
作为行,Gene_Name
作为列,频率作为对应于 sample_name
和 Gene_names
的数据。
library( data.table )
dt <- fread("./cn0_gene.csv")
dcast( dt, Sample_Name ~ Gene_Names, value.var = "Frequencey" )
# Sample_Name ACTA2,FAS,FAS-AS1 MMP26,OR51A2 NOT_FOUND OR52N1,OR52N5,TRIM5 RHD,RSRP1 RNLS SCAPER TP63 WWOX
# 1: gw6.00033 NA NA 4 NA NA NA NA NA NA
# 2: gw6.0006 1 1 5 1 1 1 1 1 1
要将 NA 填入零,请使用:
dcast( dt, Sample_Name ~Gene_Names, value.var = "Frequencey", fill = 0 )
# Sample_Name ACTA2,FAS,FAS-AS1 MMP26,OR51A2 NOT_FOUND OR52N1,OR52N5,TRIM5 RHD,RSRP1 RNLS SCAPER TP63 WWOX
# 1: gw6.00033 0 0 4 0 0 0 0 0 0
# 2: gw6.0006 1 1 5 1 1 1 1 1 1
您也可以使用基础 R。
tt <- read.csv("cn0_gene.csv", header=TRUE, sep="", row.names=1)
reshape(tt, idvar=c("Sample_Name", "Copy_No"), timevar = "Gene_Names", direction="wide")
# Sample_Name Copy_No Frequencey.NOT_FOUND Frequencey.ACTA2,FAS,FAS-AS1
# 1 gw6.00033 cn=0 4 NA
# 102 gw6.0006 cn=0 5 1
# Frequencey.MMP26,OR51A2 Frequencey.OR52N1,OR52N5,TRIM5 Frequencey.RHD,RSRP1
# 1 NA NA NA
# 102 1 1 1
# Frequencey.RNLS Frequencey.SCAPER Frequencey.TP63 Frequencey.WWOX
# 1 NA NA NA NA
# 102 1 1 1 1
这是一个 CSV 文件cn0_gene.csv:-
X Sample_Name Gene_Names Frequencey
1 gw6.00033 NOT_FOUND 4
102 gw6.0006 ACTA2,FAS,FAS-AS1 1
103 gw6.0006 MMP26,OR51A2 1
104 gw6.0006 NOT_FOUND 5
105 gw6.0006 OR52N1,OR52N5,TRIM5 1
如果数据在 CSV 文件中,我如何创建矩阵?
预期输出:- 唯一的 sample_name
作为行,Gene_Name
作为列,频率作为对应于 sample_name
和 Gene_names
的数据。
library( data.table )
dt <- fread("./cn0_gene.csv")
dcast( dt, Sample_Name ~ Gene_Names, value.var = "Frequencey" )
# Sample_Name ACTA2,FAS,FAS-AS1 MMP26,OR51A2 NOT_FOUND OR52N1,OR52N5,TRIM5 RHD,RSRP1 RNLS SCAPER TP63 WWOX
# 1: gw6.00033 NA NA 4 NA NA NA NA NA NA
# 2: gw6.0006 1 1 5 1 1 1 1 1 1
要将 NA 填入零,请使用:
dcast( dt, Sample_Name ~Gene_Names, value.var = "Frequencey", fill = 0 )
# Sample_Name ACTA2,FAS,FAS-AS1 MMP26,OR51A2 NOT_FOUND OR52N1,OR52N5,TRIM5 RHD,RSRP1 RNLS SCAPER TP63 WWOX
# 1: gw6.00033 0 0 4 0 0 0 0 0 0
# 2: gw6.0006 1 1 5 1 1 1 1 1 1
您也可以使用基础 R。
tt <- read.csv("cn0_gene.csv", header=TRUE, sep="", row.names=1)
reshape(tt, idvar=c("Sample_Name", "Copy_No"), timevar = "Gene_Names", direction="wide")
# Sample_Name Copy_No Frequencey.NOT_FOUND Frequencey.ACTA2,FAS,FAS-AS1
# 1 gw6.00033 cn=0 4 NA
# 102 gw6.0006 cn=0 5 1
# Frequencey.MMP26,OR51A2 Frequencey.OR52N1,OR52N5,TRIM5 Frequencey.RHD,RSRP1
# 1 NA NA NA
# 102 1 1 1
# Frequencey.RNLS Frequencey.SCAPER Frequencey.TP63 Frequencey.WWOX
# 1 NA NA NA NA
# 102 1 1 1 1