pandas 根据聚合计算方差

Question

我有一个包含这些列的数据框：Date、ID 和 Value。我需要对 Value 执行均值、中位数和方差，我使用 .agg 是这样的：

df = dataset\
    .groupby(['ID', pd.Grouper(key='Date', freq='60T')])['Value']\
    .agg(['mean', 'median', 'var'])\
    .reset_index()

它成功计算了平均值，但是当它需要计算中位数时，它只是简单地重复平均值而不存储或创建 var 列。结果如下：

      ID                 Date      mean    median  var
0  13834  2017-02-09 12:00:00  1.474920  1.474920  NaN
1  13834  2017-02-09 16:00:00  4.424796  4.424796  NaN
2  13834  2017-02-09 20:00:00  2.241871  2.241871  NaN
3  13834  2017-02-10 00:00:00  2.654867  2.654867  NaN
4  13834  2017-02-10 04:00:00  2.654867  2.654867  NaN
5  13834  2017-02-10 08:00:00  0.511062  0.511062  NaN

在最后一个数字的末尾应该有方差列，而我什么也没得到（或者 NaNs，如果显示在数据框中）。我该如何解决这个问题？

Answer 1

是解决方案：

Because you have 1 row per group - check in a dummy example: df.groupby(df.index).agg(["mean", "median", "var"]).reset_index() - it apparently uses variance estimator with 1/(N-1), which returns NaN, if N=1. http://en.wikipedia.org/wiki/Variance

pandas 根据聚合计算方差

pandas calculate variance from aggregation

python

variance

pandas