在 R 中绘制 PCA 与一维

plot PCA vs one dimension in R

我有一个数据集,其中 10 个维度作为特征,1 个维度作为聚类编号(11 个维度一起)。如何使用 R 绘制数据 (PC1) 与簇数的 PCA?

qplot(x = not_null_df$TSC_8125, y =  pca, data = subset(not_null_df, select = c (not_null_df$AVG_ERTEBAT,not_null_df$AVG_ROSHD,not_null_df$AVG_HOGHOGH,not_null_df$AVG_MM,not_null_df$AVG_MK,not_null_df$AVG_TM,not_null_df$AVG_VEJHE,not_null_df$AVG_ANGIZEH,not_null_df$AVG_TAHOD)), main = "Loadings for PC1", xlab = "cluster number")


Don't know how to automatically pick scale for object of type princomp. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (564): x, y

     ï..QN           NAMECODE        GENDER      VAZEYATTAAHOL     TAHSILAT          SEN           SABEGHE     
 Min.   :  1.00   Min.   : 1.0   Min.   :1.000   Min.   :1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.: 28.00   1st Qu.:11.0   1st Qu.:1.000   1st Qu.:1.75   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.000  
 Median : 60.00   Median :13.0   Median :1.000   Median :2.00   Median :3.000   Median :1.000   Median :1.000  
 Mean   : 68.63   Mean   :11.7   Mean   :1.152   Mean   :1.75   Mean   :2.578   Mean   :1.394   Mean   :1.121  
 3rd Qu.:103.25   3rd Qu.:14.0   3rd Qu.:1.000   3rd Qu.:2.00   3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:1.000  
 Max.   :190.00   Max.   :16.0   Max.   :2.000   Max.   :2.00   Max.   :3.000   Max.   :3.000   Max.   :3.000  
  AVG_ERTEBAT       AVG_ROSHD       AVG_HOGHOGH         AVG_MM           AVG_MK           AVG_TM         AVG_VEJHE     
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 5.333   1st Qu.: 4.125   1st Qu.: 1.750   1st Qu.: 5.000   1st Qu.: 3.125   1st Qu.: 5.981   1st Qu.: 4.556  
 Median : 7.000   Median : 5.875   Median : 3.500   Median : 7.727   Median : 5.000   Median : 8.000   Median : 6.333  
 Mean   : 6.730   Mean   : 5.787   Mean   : 4.001   Mean   : 6.903   Mean   : 4.890   Mean   : 7.390   Mean   : 6.095  
 3rd Qu.: 8.425   3rd Qu.: 7.656   3rd Qu.: 6.000   3rd Qu.: 9.182   3rd Qu.: 6.688   3rd Qu.: 9.204   3rd Qu.: 7.778  
 Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000  
  AVG_ANGIZEH       AVG_TAHOD        AVG_SOALAT        TSC_8125          avg       
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   :1.000   Min.   :0.000  
 1st Qu.: 5.000   1st Qu.: 5.833   1st Qu.: 4.000   1st Qu.:1.000   1st Qu.:4.788  
 Median : 7.000   Median : 7.667   Median : 7.000   Median :2.000   Median :6.301  
 Mean   : 6.549   Mean   : 7.171   Mean   : 6.025   Mean   :2.046   Mean   :6.154  
 3rd Qu.: 8.750   3rd Qu.: 9.000   3rd Qu.: 8.000   3rd Qu.:3.000   3rd Qu.:7.599  
 Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :3.000   Max.   :9.978  

我可以通过此代码获取 pca:

pca <- princomp(not_null_df, cor=TRUE, scores=TRUE)

Importance of components:
                         Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6     Comp.7     Comp.8     Comp.9
Standard deviation     2.887437 1.28937443 1.12619079 1.08816449 0.98432226 0.91257779 0.90980017 0.82303807 0.74435256
Proportion of Variance 0.438805 0.08749929 0.06675293 0.06232116 0.05099423 0.04383149 0.04356507 0.03565219 0.02916109
Cumulative Proportion  0.438805 0.52630426 0.59305720 0.65537835 0.70637258 0.75020406 0.79376914 0.82942133 0.85858242
                          Comp.10    Comp.11    Comp.12    Comp.13    Comp.14    Comp.15   Comp.16    Comp.17     Comp.18
Standard deviation     0.70304085 0.67709130 0.62905993 0.59284646 0.50799135 0.48013732 0.4476952 0.39317004 0.378722707
Proportion of Variance 0.02601402 0.02412909 0.02082718 0.01849826 0.01358185 0.01213325 0.0105490 0.00813593 0.007548994
Cumulative Proportion  0.88459644 0.90872553 0.92955271 0.94805097 0.96163282 0.97376607 0.9843151 0.99245101 1.000000000
Standard deviation     1.838143e-08
Proportion of Variance 1.778301e-17
Cumulative Proportion  1.000000e+00

我的目标是绘制 pca(只是 Comp.1)与 TSC_8125(即簇数)

函数 princomp() returns 一个包含 7 个元素的列表。这些是 sdev、loadings、center、scale、n.obs、scores 和 call。您可以在函数帮助页面(您可以通过键入 ?princomp 访问)中找到这些的描述。根据情节的目的,这里感兴趣的可能是分数。

scores: the scores of the supplied data on the principal components.

loadings: the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors).

访问列表元素的最简单方法是通过 $ 运算符。因此,pca$scores 或 pca$loadings 将分别访问它们。 scores和loadings都是class矩阵,每一列对应一个主成分(第一个col是第一个主成分,依此类推。)


comp.1 <- pca$scores[,1]


plot (comp.1 ~ not_null_df$TSC_8125)

如果您愿意,也可以使用 qplot 绘制它

qplot(x = not_null_df$TSC_8125, y =  comp.1, main = "Scores for PC1", xlab = "cluster number")