正在根据 Python 中的新数据计算主成分

Question

我正在使用以下代码对鸢尾花数据进行主成分分析：

from sklearn import datasets
iris = datasets.load_iris() 
dat = pd.DataFrame(data=iris.data, columns=['sl', 'sw', 'pl', 'pw'])

from sklearn.preprocessing import scale
stddat = scale(dat)

from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pc_out = pca.fit_transform(stddat)
pcdf = pd.DataFrame(data = pc_out , columns = ['PC-1', 'PC-2'])
print(pcdf.head())

输出：

       PC-1      PC-2
0 -2.264542  0.505704
1 -2.086426 -0.655405
2 -2.367950 -0.318477
3 -2.304197 -0.575368
4 -2.388777  0.674767

现在我想为 'sl'、'sw'、'pl' 和 'pw' 的一组新值确定 PC-1，例如：4.8、3.1、 1.3，0.2。我怎样才能做到这一点？我找不到使用 sklearn 库执行此操作的任何方法。

编辑：如评论中所述，我可以使用命令 pca.transform(new_data) 获取新数据的 PC 值。但是，我对获取变量 loadings 很感兴趣，这样我以后就可以在任何地方使用这些数字来确定 PC 值，而不仅仅是在当前环境中。

loadings 我的意思是 "the weight by which each standardized original variable should be multiplied to get the component score"（来自 https://en.wikipedia.org/wiki/Principal_component_analysis ). I cannot find a method to do this on the documentation page: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

Answer 1

这是可用的 transform 函数 here:

    def transform(self, X):
        """Apply dimensionality reduction to X.
        X is projected on the first principal components previously extracted
        from a training set.
        Parameters
        ----------
        X : array-like, shape (n_samples, n_features)
            New data, where n_samples is the number of samples
            and n_features is the number of features.
        Returns
        -------
        X_new : array-like, shape (n_samples, n_components)
        Examples
        --------
        >>> import numpy as np
        >>> from sklearn.decomposition import IncrementalPCA
        >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
        >>> ipca = IncrementalPCA(n_components=2, batch_size=3)
        >>> ipca.fit(X)
        IncrementalPCA(batch_size=3, copy=True, n_components=2, whiten=False)
        >>> ipca.transform(X) # doctest: +SKIP
        """
        check_is_fitted(self, ['mean_', 'components_'], all_or_any=all)

        X = check_array(X)
        if self.mean_ is not None:
            X = X - self.mean_
        X_transformed = np.dot(X, self.components_.T)
        if self.whiten:
            X_transformed /= np.sqrt(self.explained_variance_)
        return X_transformed

变量加载是您从 pca.components_ 获得的组件。确保你的 mean_ 是 0 并且 whiten 是 False，然后你可以简单地获取该矩阵并在任何你想转换你的 matrices/vectors 的地方使用它.

正在根据 Python 中的新数据计算主成分

Calculating Principal Components from new data in Python

python

pca

scikit-learn