需要使用 numpy 或 sklearn 对 python 中的数据帧集合执行主成分分析

Question

我有一个 'dataframe collection' df，数据如下。我正在尝试使用 sklearn 对数据帧集合执行主成分分析（PCA）。但是我遇到了 Typeerror

from sklearn.decomposition import PCA
df  # dataframe collection
pca = PCA(n_components=5)
pca.fit(X)

如何将数据帧集合转换为带序列的数组矩阵。我想如果我转换成数组矩阵，我将能够进行 PCA

数据：

{'USSP2 CMPN Curncy': 
 0       0.297453
 1       0.320505
 2       0.345978
 3       0.427871
 Name: (USSP2 CMPN Curncy, PX_LAST), Length: 1747, dtype: float64, 
 'MARGDEBT Index': 
 0     0.095478
 1     0.167469
 2     0.186317
 3     0.203729
 Name: (MARGDEBT Index, PX_LAST), Length: 79, dtype: float64, 
 'SL% SMT% Index': 
 0     0.163636
 1     0.000000
 2     0.000000
 3     0.363636
 Name: (SL% SMT% Index, PX_LAST), dtype: float64, 
 'FFSRAIWS Index': 
 0     0.157234
 1     0.278174
 2     0.530603
 3     0.526519
 Name: (FFSRAIWS Index, PX_LAST), dtype: float64, 
 'USPHNSA Index': 
 0     0.107330
 1     0.213351
 2     0.544503
 3     0.460733
 Name: (USPHNSA Index, PX_LAST), Length: 79, dtype: float64]

任何人都可以在 PCA 中帮助收集数据帧。谢谢！

Answer 1

您的 数据框集合 是 DataFrame 个对象的字典 (dict)。

要执行分析，您需要使用一组数据。所以第一步是将数据转换成单个DataFrame。 Pandas 原生支持从数据帧字典连接，例如

import pandas as pd

df = {
    'Currency1': pd.DataFrame([[0.297453,0.5]]),
    'Currency2': pd.DataFrame([[0.297453,0.5]])
}      

X = pd.concat(df)

您现在可以对 DataFrame 中的值执行 PCA，例如

pca = PCA(n_components=5)
pca.fit(X.values)

需要使用 numpy 或 sklearn 对 python 中的数据帧集合执行主成分分析

Need to perform Principal component analysis on a dataframe collection in python using numpy or sklearn

python

pca

sklearn-pandas