如何绘制三个不同组的主成分 1、2 和 3 的箱线图?
How can I plot box-plots for principal components 1, 2 and 3 for three different groups?
我计算了主成分分析 (PCA) 并得出了 PC1 与 PC2 的关系图。这显示了在比较三个疾病组(0(对照)、1(溃疡性结肠炎)和 2(克罗恩病))时大约 14 个基因表达的变化。
我想为前三个主要成分的每个组绘制一个箱线图,总共产生 9 个箱线图。
计算 PCA 之前的数据矩阵具有对应于数字 0、1 或 2 的行名称。列代表不同的基因(以及相应的基因表达值)。
我使用 prcomp 来计算 PCA 图(缩放和居中以及对数转换)。
这是 PCA 之前我的矩阵的快照;
structure(c(9.11655423831332, 10.489164314825, 1.91402056531454,
7.15827328042159, 4.24137583841638, 8.27769344002199, 8.56104058610663,
10.4808234419919, 2.90978833628418, 6.23818256006594, 5.22964773531333,
10.7708328724305, 7.29461400089235, 11.8318994425553, 3.03424662623575,
8.01272738639518, 4.99017087770597, 11.5985078491858, 7.81888257764922,
11.9022935347989, 1.27378277405718, 7.22371591364402, 5.35032777682152,
11.3245694322554, 7.53493825433311, 12.3702117577478, 2.28591365299837,
6.3684670711928, 4.79325114470697, 11.2368359301193, 7.42400102411584,
10.4893608659259, 2.29357094839174, 7.39880980207098, 4.06127337845416,
10.064874404576, 8.23639009062635, 12.041628287702, 1.68881444318413,
6.83433748681479, 4.58216981866268, 10.7369117797388, 8.52022902181642,
11.8310518930764, 1.09698581801487, 7.01560705946119, 4.42096319700341,
9.55024900954538, 6.78397242802669, 10.7346656491963, 1.8562428132184,
6.79381714159694, 4.76311785326908, 9.2896578696716, 7.38261637784709,
11.8956476271189, 0.676793904156995, 7.12068629785535, 4.50969591112091,
10.3965680730289, 7.76024460081224, 11.4191374294463, 2.51273901194187,
6.49764372886188, 5.95216200154652, 8.80877686581081, 7.92745512232284,
9.64936710370214, 2.75037060332872, 8.32919606967059, 5.13312284319216,
10.0205608136955, 8.32640003009823, 10.7914139100956, 3.07554840032925,
7.71871340592007, 5.75595649315905, 9.71791978048218, 7.13284940508783,
10.9113426747693, 1.07350504928193, 6.56249247218448, 5.35574874951741,
9.54833175767732), .Dim = c(6L, 14L), .Dimnames = list(c("1",
"1", "0", "0", "2", "2"), c("Gene1", "Gene2", "Gene3", "Gene4",
"Gene5", "Gene6", "Gene7", "Gene8", "Gene9", "Gene10", "Gene11",
"Gene12", "Gene13", "Gene14")))
更新;第二个问题删除。
PCA图代码如下;
data.mat.1.pca <- prcomp(log(data.mat.1), scale.=T, center=T)
pcvalues <- summary(data.mat.1.pca)
#colour coding each disease group
rownames(data.mat.1)
colour_disease <- rownames(data.mat.1)
position_control<- grep("0", colour_disease)
position_UC<- grep("1", colour_disease)
position_Crohn<- grep("2", colour_disease)
disease <- vector()
disease[position_control] <- "lightskyblue"
disease[position_UC] <- "lightslategrey"
disease[position_Crohn] <- "lightpink2"
##proportion of variance explained for PC1 and PC2 for plot
eigs<- data.mat.1.pca$sdev^2
varExplained.pc1<- round(eigs[1]/sum(eigs), digits=3)*100
varExplained.pc2 <- round(eigs[2]/sum(eigs), digits=3)*100
plot(data.mat.1.pca$x[,1], data.mat.1.pca$x[,2],
col=disease, bg=disease, pch=19, cex=1,
xlab=paste("PCA 1 (", varExplained.pc1, "%)", sep=""),
ylab=paste("PCA 2 (", varExplained.pc2, "%)", sep=""))
legend("bottomright", legend = c("Control", "UC", "Crohns"),
fill=c("lightskyblue", "lightslategrey", "lightpink2"))
前三台PC的值如下;
PC1 PC2 PC3
S.D 3.6619 0.44801 0.30046
Proportion of Variance 0.9578 0.01424 0.00645
Cumulative proportion 0.9578 0.97215 0.97860
这里是link一张研究论文图片https://www.researchgate.net/figure/Boxplots-of-the-first-three-principal-components-of-the-kidney-data-Group-specific_fig1_316641179
他们正在比较对照组和治疗组,而我需要三个箱形图(每组一个)。
像这样绘制组件分数很奇怪,但请尝试以下操作以获得您提到的组合的点图:
df = data.frame(disease=rownames(data.mat.1),data.mat.1.pca$x[,1:3])
df %>% pivot_longer(-disease) %>%
ggplot(aes(x=name,col=disease,y=value)) +
geom_point(position=position_jitterdodge())
我希望你每组有 2 个以上的样本,不像你的例子。添加箱线图很简单:
df %>% pivot_longer(-disease) %>%
ggplot(aes(x=name,col=disease,y=value)) +
geom_point(position=position_jitterdodge())+
geom_boxplot(alpha=0.7)
我计算了主成分分析 (PCA) 并得出了 PC1 与 PC2 的关系图。这显示了在比较三个疾病组(0(对照)、1(溃疡性结肠炎)和 2(克罗恩病))时大约 14 个基因表达的变化。
我想为前三个主要成分的每个组绘制一个箱线图,总共产生 9 个箱线图。
计算 PCA 之前的数据矩阵具有对应于数字 0、1 或 2 的行名称。列代表不同的基因(以及相应的基因表达值)。
我使用 prcomp 来计算 PCA 图(缩放和居中以及对数转换)。
这是 PCA 之前我的矩阵的快照;
structure(c(9.11655423831332, 10.489164314825, 1.91402056531454,
7.15827328042159, 4.24137583841638, 8.27769344002199, 8.56104058610663,
10.4808234419919, 2.90978833628418, 6.23818256006594, 5.22964773531333,
10.7708328724305, 7.29461400089235, 11.8318994425553, 3.03424662623575,
8.01272738639518, 4.99017087770597, 11.5985078491858, 7.81888257764922,
11.9022935347989, 1.27378277405718, 7.22371591364402, 5.35032777682152,
11.3245694322554, 7.53493825433311, 12.3702117577478, 2.28591365299837,
6.3684670711928, 4.79325114470697, 11.2368359301193, 7.42400102411584,
10.4893608659259, 2.29357094839174, 7.39880980207098, 4.06127337845416,
10.064874404576, 8.23639009062635, 12.041628287702, 1.68881444318413,
6.83433748681479, 4.58216981866268, 10.7369117797388, 8.52022902181642,
11.8310518930764, 1.09698581801487, 7.01560705946119, 4.42096319700341,
9.55024900954538, 6.78397242802669, 10.7346656491963, 1.8562428132184,
6.79381714159694, 4.76311785326908, 9.2896578696716, 7.38261637784709,
11.8956476271189, 0.676793904156995, 7.12068629785535, 4.50969591112091,
10.3965680730289, 7.76024460081224, 11.4191374294463, 2.51273901194187,
6.49764372886188, 5.95216200154652, 8.80877686581081, 7.92745512232284,
9.64936710370214, 2.75037060332872, 8.32919606967059, 5.13312284319216,
10.0205608136955, 8.32640003009823, 10.7914139100956, 3.07554840032925,
7.71871340592007, 5.75595649315905, 9.71791978048218, 7.13284940508783,
10.9113426747693, 1.07350504928193, 6.56249247218448, 5.35574874951741,
9.54833175767732), .Dim = c(6L, 14L), .Dimnames = list(c("1",
"1", "0", "0", "2", "2"), c("Gene1", "Gene2", "Gene3", "Gene4",
"Gene5", "Gene6", "Gene7", "Gene8", "Gene9", "Gene10", "Gene11",
"Gene12", "Gene13", "Gene14")))
更新;第二个问题删除。
PCA图代码如下;
data.mat.1.pca <- prcomp(log(data.mat.1), scale.=T, center=T)
pcvalues <- summary(data.mat.1.pca)
#colour coding each disease group
rownames(data.mat.1)
colour_disease <- rownames(data.mat.1)
position_control<- grep("0", colour_disease)
position_UC<- grep("1", colour_disease)
position_Crohn<- grep("2", colour_disease)
disease <- vector()
disease[position_control] <- "lightskyblue"
disease[position_UC] <- "lightslategrey"
disease[position_Crohn] <- "lightpink2"
##proportion of variance explained for PC1 and PC2 for plot
eigs<- data.mat.1.pca$sdev^2
varExplained.pc1<- round(eigs[1]/sum(eigs), digits=3)*100
varExplained.pc2 <- round(eigs[2]/sum(eigs), digits=3)*100
plot(data.mat.1.pca$x[,1], data.mat.1.pca$x[,2],
col=disease, bg=disease, pch=19, cex=1,
xlab=paste("PCA 1 (", varExplained.pc1, "%)", sep=""),
ylab=paste("PCA 2 (", varExplained.pc2, "%)", sep=""))
legend("bottomright", legend = c("Control", "UC", "Crohns"),
fill=c("lightskyblue", "lightslategrey", "lightpink2"))
前三台PC的值如下;
PC1 PC2 PC3
S.D 3.6619 0.44801 0.30046
Proportion of Variance 0.9578 0.01424 0.00645
Cumulative proportion 0.9578 0.97215 0.97860
这里是link一张研究论文图片https://www.researchgate.net/figure/Boxplots-of-the-first-three-principal-components-of-the-kidney-data-Group-specific_fig1_316641179
他们正在比较对照组和治疗组,而我需要三个箱形图(每组一个)。
像这样绘制组件分数很奇怪,但请尝试以下操作以获得您提到的组合的点图:
df = data.frame(disease=rownames(data.mat.1),data.mat.1.pca$x[,1:3])
df %>% pivot_longer(-disease) %>%
ggplot(aes(x=name,col=disease,y=value)) +
geom_point(position=position_jitterdodge())
我希望你每组有 2 个以上的样本,不像你的例子。添加箱线图很简单:
df %>% pivot_longer(-disease) %>%
ggplot(aes(x=name,col=disease,y=value)) +
geom_point(position=position_jitterdodge())+
geom_boxplot(alpha=0.7)