如何使用 pandas 找到每个主成分的前三个特征?
How to find the top three features of every principal component using pandas?
我正在按照给定的解决方案 。
但是解决方案从每个主成分中获取 argmax()
特征。我想拿前三名。我该怎么做?
我基本上想分别知道哪些功能对每台 PC 的影响最大。
谢谢。
您可以使用np.argsort
或np.argpartition
获取排序后的索引。按照指示的问题程序
# With argsort
most_important = [np.argsort(np.abs(model.components_[i]))[::-1][:3] for i in range(n_pcs)]
# With argpartition
most_important = [np.argpartition(np.abs(model.components_[i]), -3)[-3:] for i in range(n_pcs)]
most_important
>>> [array([4, 1, 0]), array([2, 3, 4])]
然后将最重要的组件作为列
initial_feature_names = ['a','b','c','d','e']
# Notices the [::-1] is used to order the component names
most_important_names = [[initial_feature_names[i] for i in most_important[i][::-1]] for i in range(n_pcs)]
dic = {'PC{}'.format(i): most_important_names[i] for i in range(n_pcs)}
pd.DataFrame.from_dict(dic).T
>>>
0 1 2
PC0 e b a
PC1 c d e
我正在按照给定的解决方案
但是解决方案从每个主成分中获取 argmax()
特征。我想拿前三名。我该怎么做?
我基本上想分别知道哪些功能对每台 PC 的影响最大。
谢谢。
您可以使用np.argsort
或np.argpartition
获取排序后的索引。按照指示的问题程序
# With argsort
most_important = [np.argsort(np.abs(model.components_[i]))[::-1][:3] for i in range(n_pcs)]
# With argpartition
most_important = [np.argpartition(np.abs(model.components_[i]), -3)[-3:] for i in range(n_pcs)]
most_important
>>> [array([4, 1, 0]), array([2, 3, 4])]
然后将最重要的组件作为列
initial_feature_names = ['a','b','c','d','e']
# Notices the [::-1] is used to order the component names
most_important_names = [[initial_feature_names[i] for i in most_important[i][::-1]] for i in range(n_pcs)]
dic = {'PC{}'.format(i): most_important_names[i] for i in range(n_pcs)}
pd.DataFrame.from_dict(dic).T
>>>
0 1 2
PC0 e b a
PC1 c d e