Reducing Dimensions using PCA: AttributeError: 'numpy.ndarray' object has no attribute 'items'

Question

我正在尝试在 DZone (https://dzone.com/articles/cv-r-cvs-retrieval-system-based-on-job-description) 和运行上实施一个示例项目，但遇到了问题。在这种情况下，我设置了

dir_pca_we_EWE = 'pickle_model_pca.pkl'

并且正在执行以下操作：

def reduce_dimensions_WE(dir_we_EWE, dir_pca_we_EWE):
    m1 = KeyedVectors.load_word2vec_format('./wiki.en/GoogleNews.bin', binary=True)
    model1 = {}
    # normalize vectors
    for string in m1.wv.vocab:
        model1[string] = m1.wv[string] / np.linalg.norm(m1.wv[string])
    # reduce dimensionality
    pca = decomposition.PCA(n_components=200)
    pca.fit(np.array(list(model1.values())))
    model1 = pca.transform(np.array(list(model1.values())))
    i = 0
    for key, value in model1.items():
        model1[key] = model1[i] / np.linalg.norm(model1[i])
        i = i + 1
    with open(dir_pca_we_EWE, 'wb') as handle:
        pickle.dump(model1, handle, protocol=pickle.HIGHEST_PROTOCOL)
return model1

然后会产生以下错误：

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 12, in reduce_dimensions_WE
AttributeError: 'numpy.ndarray' object has no attribute 'items'

一如既往，非常感谢所有帮助！

Answer 1

您首先将 model1 = {} 初始化为一个空字典。通过在

中使用 transform

model1 = pca.transform(np.array(list(model1.values())))

变量model1变成了numpy.ndarray，也就是pca的transform方法的return类型。在行

for key, value in model1.items():
    ...

你仍然使用 model1 就好像它是一个 dict，它不再是。

Answer 2

@datasailor 回答了您的问题并告诉您出了什么问题。在评论中，您询问如何将数据的维度减少到 200，我认为最简单的方法是使用 sklearn.decomposition.PCA 中的 .fit_transform，而不是像您那样使用 .transform目前正在使用：

from sklearn.decomposition import PCA
pca = PCA(n_components=200)
lower_dim_Data=pca.fit_transform(data)

Reducing Dimensions using PCA: AttributeError: 'numpy.ndarray' object has no attribute 'items'

Reducing Dimensions using PCA: AttributeError: 'numpy.ndarray' object has no attribute 'items'

python

numpy

machine-learning

pca

numpy-ndarray