正在根据 Python 中的新数据计算主成分
Calculating Principal Components from new data in Python
我正在使用以下代码对鸢尾花数据进行主成分分析:
from sklearn import datasets
iris = datasets.load_iris()
dat = pd.DataFrame(data=iris.data, columns=['sl', 'sw', 'pl', 'pw'])
from sklearn.preprocessing import scale
stddat = scale(dat)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pc_out = pca.fit_transform(stddat)
pcdf = pd.DataFrame(data = pc_out , columns = ['PC-1', 'PC-2'])
print(pcdf.head())
输出:
PC-1 PC-2
0 -2.264542 0.505704
1 -2.086426 -0.655405
2 -2.367950 -0.318477
3 -2.304197 -0.575368
4 -2.388777 0.674767
现在我想为 'sl'、'sw'、'pl' 和 'pw' 的一组新值确定 PC-1,例如:4.8、3.1、 1.3,0.2。我怎样才能做到这一点?我找不到使用 sklearn 库执行此操作的任何方法。
编辑:如评论中所述,我可以使用命令 pca.transform(new_data)
获取新数据的 PC 值。但是,我对获取变量 loadings
很感兴趣,这样我以后就可以在任何地方使用这些数字来确定 PC 值,而不仅仅是在当前环境中。
loadings
我的意思是 "the weight by which each standardized original variable should be multiplied to get the component score"(来自 https://en.wikipedia.org/wiki/Principal_component_analysis ). I cannot find a method to do this on the documentation page: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
这是可用的 transform
函数 here:
def transform(self, X):
"""Apply dimensionality reduction to X.
X is projected on the first principal components previously extracted
from a training set.
Parameters
----------
X : array-like, shape (n_samples, n_features)
New data, where n_samples is the number of samples
and n_features is the number of features.
Returns
-------
X_new : array-like, shape (n_samples, n_components)
Examples
--------
>>> import numpy as np
>>> from sklearn.decomposition import IncrementalPCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> ipca = IncrementalPCA(n_components=2, batch_size=3)
>>> ipca.fit(X)
IncrementalPCA(batch_size=3, copy=True, n_components=2, whiten=False)
>>> ipca.transform(X) # doctest: +SKIP
"""
check_is_fitted(self, ['mean_', 'components_'], all_or_any=all)
X = check_array(X)
if self.mean_ is not None:
X = X - self.mean_
X_transformed = np.dot(X, self.components_.T)
if self.whiten:
X_transformed /= np.sqrt(self.explained_variance_)
return X_transformed
变量加载是您从 pca.components_
获得的组件。确保你的 mean_
是 0
并且 whiten
是 False
,然后你可以简单地获取该矩阵并在任何你想转换你的 matrices/vectors 的地方使用它.
我正在使用以下代码对鸢尾花数据进行主成分分析:
from sklearn import datasets
iris = datasets.load_iris()
dat = pd.DataFrame(data=iris.data, columns=['sl', 'sw', 'pl', 'pw'])
from sklearn.preprocessing import scale
stddat = scale(dat)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pc_out = pca.fit_transform(stddat)
pcdf = pd.DataFrame(data = pc_out , columns = ['PC-1', 'PC-2'])
print(pcdf.head())
输出:
PC-1 PC-2
0 -2.264542 0.505704
1 -2.086426 -0.655405
2 -2.367950 -0.318477
3 -2.304197 -0.575368
4 -2.388777 0.674767
现在我想为 'sl'、'sw'、'pl' 和 'pw' 的一组新值确定 PC-1,例如:4.8、3.1、 1.3,0.2。我怎样才能做到这一点?我找不到使用 sklearn 库执行此操作的任何方法。
编辑:如评论中所述,我可以使用命令 pca.transform(new_data)
获取新数据的 PC 值。但是,我对获取变量 loadings
很感兴趣,这样我以后就可以在任何地方使用这些数字来确定 PC 值,而不仅仅是在当前环境中。
loadings
我的意思是 "the weight by which each standardized original variable should be multiplied to get the component score"(来自 https://en.wikipedia.org/wiki/Principal_component_analysis ). I cannot find a method to do this on the documentation page: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
这是可用的 transform
函数 here:
def transform(self, X):
"""Apply dimensionality reduction to X.
X is projected on the first principal components previously extracted
from a training set.
Parameters
----------
X : array-like, shape (n_samples, n_features)
New data, where n_samples is the number of samples
and n_features is the number of features.
Returns
-------
X_new : array-like, shape (n_samples, n_components)
Examples
--------
>>> import numpy as np
>>> from sklearn.decomposition import IncrementalPCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> ipca = IncrementalPCA(n_components=2, batch_size=3)
>>> ipca.fit(X)
IncrementalPCA(batch_size=3, copy=True, n_components=2, whiten=False)
>>> ipca.transform(X) # doctest: +SKIP
"""
check_is_fitted(self, ['mean_', 'components_'], all_or_any=all)
X = check_array(X)
if self.mean_ is not None:
X = X - self.mean_
X_transformed = np.dot(X, self.components_.T)
if self.whiten:
X_transformed /= np.sqrt(self.explained_variance_)
return X_transformed
变量加载是您从 pca.components_
获得的组件。确保你的 mean_
是 0
并且 whiten
是 False
,然后你可以简单地获取该矩阵并在任何你想转换你的 matrices/vectors 的地方使用它.