DataFrame Corr()，找不到列索引

Question

我试图找到 citable documents per person 和 Energy Supply per Capita 之间的相关系数，（Pearson 相关系数）

所以我创建了一个 table Top15，其列 =

Index(['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index', 'Energy Supply', 'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', 'popu', 'citable documents per person'], dtype='object')

但是当我写 Top15.corr() 时，结果只显示

之间的相关性

['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations',
           'Citations per document', 'H index', 
            '% Renewable', '2006', '2007', '2008',
           '2009', '2010', '2011', '2012', '2013', '2014', '2015']

没有citable documents per person和Energy Supply per Capita

然后我创建一个新的tabledf

但是当我写df.corr()时，我得到了_，然后我也尝试了df.corr(method='pearson')，但是得到了相同的结果_

网上搜了一下，找到这个方法：

from scipy.stats import pearsonr corr, _ = pearsonr(Top15['citable documents per person'], Top15['Energy Supply per Capita']) print(corr)

这行得通，但我不明白表达式 corr, _ ，为什么 ,_?

谁能帮我解释一下 1. 为什么方法一和方法二会失败？为什么一些变量在 corr 结果中消失了？ 2.方法三中的,_是什么？谢谢。

Answer 1

corr 只考虑数字列。你确定你的列都是数字吗？ – BallpointBen

_ 表示忽略它，例如对于 k, v in dict.items() 将为键和值创建一个迭代器，如果你不关心你在 dict.items() 中为 k,_ 使用的值并且 _ 将被忽略 – E.Serra

我认为这是 Coursera 作业中的一个问题，我以前有时做过，这里有些列的类型是 object ，你必须使用类似 pandas [=19 的东西将它们转换为数字=]。 – 斯内希尔

DataFrame Corr()，找不到列索引

DataFrame Corr(), do not find the column index

python

correlation