如何使用我的数据集计算主成分分析以绘制显示 PC1 与 PC2 的图表?

How to calculate principal component analysis to plot graph showing PC1 vs PC2 using my data set?

请有人建议我如何使用我拥有的基因数据集进行主成分分析。

我有一个包含 15 列的 table。第一个是疾病组,其中 0 是对照,1 是溃疡性结肠炎,2 代表克罗恩病。

剩下的14列对应14个不同的基因。

我想在 PCA 之后绘制 PC1 与 PC2(通过 prcomp),以显示三组之间是否根据基因表达数据(每个轴显示方差比例)发生任何聚类或分离。但是,我很难知道从哪里开始,因为我无法通过 row.names=1 将第 1 列转换为行名,因为 R 不允许重复行名。

转换为矩阵并尝试使用以下代码无效。

mockdata1 <- as.matrix(mockdata)
rownames(mockdata1) <- mockdata1[,1]
mockdata1[,1] <- NULL
or 
mockdata2 <-mockdata1 [ ,-1]

通过我之前完成的示例,我已经能够计算 PCA 并绘制 PCA1 与 PCA2 并相应地为数据着色,遵循 row.names=1,但不确定如何克服这个问题第一个初始问题,因为我不能在这里使用它。

我已经通过 dput(head(mockdata))

在下面包含了我的数据
structure(list(Disease = c(1L, 1L, 0L, 0L, 2L, 2L), Gene1 = c(9104.774619, 
35924.12358, 6.780294688, 1284.690716, 69.50341155, 3935.107345
), Gene2 = c(5224.114486, 35625.73119, 18.35291351, 511.9272679, 
186.7270146, 47611.65544), Gene3 = c(1472.348466, 137571.5525, 
20.78531289, 3019.140256, 146.9615338, 108935.1303), Gene4 = c(2487.124686, 
147604.774, 3.574347972, 1371.576262, 210.6773417, 82831.97458
), Gene5 = c(1872.328747, 235675.6461, 9.834667594, 583.1631957, 
120.6931223, 75874.49936), Gene6 = c(1675.724728, 35931.1852, 
9.91026361, 1634.038443, 58.04818134, 23502.78972), Gene7 = c(3775.885073, 
169672.9921, 5.41305941, 929.2125312, 97.72621248, 46023.7009
), Gene8 = c(5015.202216, 137455.0032, 2.995124554, 1113.882634, 
83.17636201, 14048.19237), Gene9 = c(883.5716868, 45920.44167, 
6.399646876, 892.313155, 117.1104906, 10825.47974), Gene10 = c(1607.790858, 
146627.0588, 1.967559425, 1237.299298, 90.8941744, 32747.04713
), Gene11 = c(2345.478241, 91047.57303, 12.33867961, 663.576224, 
384.5839119, 6692.728154), Gene12 = c(2772.362496, 15511.96753, 
15.64843017, 4143.085461, 169.545757, 22484.03574), Gene13 = c(4131.51741, 
48601.7059, 21.66175797, 2250.0628, 316.0677196, 16612.6508), 
Gene14 = c(1252.440598, 54794.36695, 2.925615978, 708.0342528, 
211.822519, 14021.28425)), row.names = c(NA, 6L), class = "data.frame")

我很久以前就读过 PCA 背后的统计数据。它基本上生成 df*t(df)。如果你转置这个,你可以获得 14 个跨基因的 PC。或者您可以对 6 个样本执行此操作。不确定您对哪一个感兴趣,比较基因,还是疾病?

现在,只需更改“疾病”列。

df1 <- df[,-1]

prcomp(df1, scale=TRUE)
Standard deviations (1, .., p=6):
[1] 3.535100e+00 1.197773e+00 2.341240e-01 1.164908e-01 4.575297e-03 1.341079e-16

Rotation (n x k) = (14 x 6):
             PC1         PC2         PC3          PC4         PC5         PC6
Gene1  0.2599087 -0.28372577  0.85593436  0.105415154  0.22237406 -0.03275210
Gene2  0.2217516  0.51374227  0.19750686  0.587702793 -0.28955565 -0.05771744
Gene3  0.2658620  0.28239514 -0.18315564  0.178995551  0.20110401  0.55717953
Gene4  0.2793220  0.12737931 -0.15464656  0.169556323  0.07195134 -0.18818707
Gene5  0.2818335 -0.06260368 -0.17524380  0.066508759  0.19418520 -0.23609300
Gene6  0.2753095  0.19146854 -0.05941163 -0.009035774  0.27267048 -0.02872877
Gene7  0.2804052 -0.10775145 -0.11519116  0.035882068  0.20239393 -0.18183889
Gene8  0.2697234 -0.25139746 -0.04378742 -0.067463492  0.21022545  0.41231175
Gene9  0.2787618 -0.13908918 -0.13407274 -0.103917770 -0.05348686  0.14863994
Gene10 0.2782801 -0.14660837 -0.15870274 -0.039789645  0.20346369 -0.24515625
Gene11 0.2672274 -0.27313184 -0.09102046 -0.087085239 -0.41532869  0.18704109
Gene12 0.2089616  0.55676419  0.20707121 -0.730763260  0.02663728 -0.08109933
Gene13 0.2819621 -0.06202963  0.11256511 -0.133063115 -0.54206507  0.22546812
Gene14 0.2797161 -0.12219139 -0.12034622 -0.025956127 -0.33551757 -0.46422864

对于不同疾病的 PC,转置您的矩阵:

df2 <- t(df)
colnames(df2) <- df$Disease
df3 <- df2[-1,]
prcomp(df3,scale=TRUE)
Standard deviations (1, .., p=6):
[1] 1.3881805 1.3086775 1.0332001 0.8999575 0.5686278 0.3994426

Rotation (n x k) = (6 x 6):
         PC1        PC2         PC3        PC4        PC5         PC6
1 -0.1761229 -0.4084611  0.42521515 -0.7364243  0.2424742 -0.14218944
1  0.5843042  0.3073572 -0.04722848 -0.2941286  0.4419985  0.52916471
0 -0.5145338  0.4178422  0.04746496 -0.3238892 -0.4356411  0.51353922
0 -0.4042091  0.3652627  0.47093069  0.3634759  0.5908253 -0.01526729
2 -0.3531166  0.1750554 -0.74358399 -0.2618183  0.3973516 -0.25555829
2  0.2734007  0.6324854  0.20003938 -0.2561246 -0.2215796 -0.60868811