如何在 PCA 期间保留 class 标签
How to preserve class labels during PCA
如何在进行 PCA 时保留标签?我看到了 2 个教程,他们完全忽略了这个:tutorial
这是我的代码:
combinedOutputDataFrame = pd.DataFrame(resultArray)
# Separating out the features
x = combinedOutputDataFrame.loc[:, 0:31].values
# Separating out the target
y = combinedOutputDataFrame.loc[:,[32]].values
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
principalComponents = pca.fit_transform(x)
principalDataFrame = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2', 'principal component 3'])
finalDf = pd.concat([principalDataFrame, combinedOutputDataFrame[[32]]], axis = 1)
不过,我如何确定 principalComponents 的顺序?
principalComponents
array([[129.58602603, -21.59786631, -6.84613849],
[-39.42963482, 35.19985695, 19.86945922],
[ 54.81949577, -5.96905719, -76.57776259],
...,
[ 69.21840475, -35.17983093, -39.66853653],
[ 18.91508026, -41.64341368, 0.21503516],
[145.91595004, 127.82236242, 115.14571367]])
我的最终目标是将其可视化,并用相应的 class 为图上的每个点着色。但是如何在执行 PCA 后给数据添加标签?
组件已经按降序排列,从解释最多差异的组件到解释最少差异的组件。您可以通过使用 pca.explained_variance_ratio_
打印出解释的方差比来检查这一点
import numpy as np
from sklearn.decomposition import PCA
# just a random matrix
rand_matrix = np.random.rand(30,6)
pca = PCA(n_components=3)
principalComponents = pca.fit_transform(rand_matrix)
print(pca.explained_variance_ratio_)
Out:
array([0.28898895, 0.22460396, 0.16874681])
如何在进行 PCA 时保留标签?我看到了 2 个教程,他们完全忽略了这个:tutorial
这是我的代码:
combinedOutputDataFrame = pd.DataFrame(resultArray)
# Separating out the features
x = combinedOutputDataFrame.loc[:, 0:31].values
# Separating out the target
y = combinedOutputDataFrame.loc[:,[32]].values
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
principalComponents = pca.fit_transform(x)
principalDataFrame = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2', 'principal component 3'])
finalDf = pd.concat([principalDataFrame, combinedOutputDataFrame[[32]]], axis = 1)
不过,我如何确定 principalComponents 的顺序?
principalComponents
array([[129.58602603, -21.59786631, -6.84613849],
[-39.42963482, 35.19985695, 19.86945922],
[ 54.81949577, -5.96905719, -76.57776259],
...,
[ 69.21840475, -35.17983093, -39.66853653],
[ 18.91508026, -41.64341368, 0.21503516],
[145.91595004, 127.82236242, 115.14571367]])
我的最终目标是将其可视化,并用相应的 class 为图上的每个点着色。但是如何在执行 PCA 后给数据添加标签?
组件已经按降序排列,从解释最多差异的组件到解释最少差异的组件。您可以通过使用 pca.explained_variance_ratio_
打印出解释的方差比来检查这一点import numpy as np
from sklearn.decomposition import PCA
# just a random matrix
rand_matrix = np.random.rand(30,6)
pca = PCA(n_components=3)
principalComponents = pca.fit_transform(rand_matrix)
print(pca.explained_variance_ratio_)
Out:
array([0.28898895, 0.22460396, 0.16874681])