使用 scipy 和 groupby 计算 Kendall 的 tau

Question

我有一个 csv 文件，其中包含每年和每个气象站的降水数据。它看起来像这样：

station_id    year       Sum
 210018      1916      65.024
 210018      1917      35.941
 210018      1918      28.448
 210018      1919      68.58
 210018      1920      31.115
 215400      1916      44.958
 215400      1917      31.496
 215400      1918      38.989
 215400      1919      74.93
 215400      1920      53.5432

我想 return Kendall 的 tau 相关性和基于唯一站点 ID 的 p 值。因此，对于上面的内容，我想要站点 ID 210018 和 215400 的总和与年份之间的相关性。

station_id 210018 的相关性将为 -.20，p 值为 .62，而 station_id 215400 的相关性将为 .40，p 值为 .33。

我正在尝试使用这个：

grouped=df.groupby(['station_id'])
grouped.aggregate([tau, p_value=sp.stats.kendalltau(df.year, df.Sum)])

错误 returned 是 p_value 后等号上的语法错误。

如有任何帮助，我们将不胜感激。

Answer 1

一种计算方法是在 groupby 对象上使用 apply：

>>> import scipy.stats as st
>>> df.groupby(['station_id']).apply(lambda x: st.kendalltau(x['year'], x['Sum']))
station_id
210018        (-0.2, 0.62420612399)
215400        (0.4, 0.327186890661)
dtype: object

使用 scipy 和 groupby 计算 Kendall 的 tau

Calculating Kendall's tau using scipy and groupby

python

statistics

scipy

dataframe

pandas