将聚合函数应用于数据表列和 return 值，而不是数据表

Question

也许是个愚蠢的问题但是..

在 R data.table 中，如果我想获得列的平均值，我可以像 foo$x 一样引用列 vector 并计算其平均值像 mean(foo$x).

我不知道如何使用 Python datatable 执行此操作。例如，

# imports
import numpy as np
import datatable as dt
from datatable import f

# make datatable
np.random.seed(1)
foo = dt.Frame({'x': np.random.randn(10)})

# calculate mean
dt.mean(foo.x)  # error
dt.mean(foo[:, f.x])  # Expr:mean(<Frame [10 rows x 1 col]>) ???
foo[:, dt.mean(f.x)][0, 0]  # -0.0971

虽然最后一条语句在技术上可行，但它似乎过于繁琐，因为它首先 returns 一个 1x1 datatable 我从中提取唯一的值。我正在努力解决的根本问题是，我不明白 python 数据表 and/or 中是否存在列 vectors 如何引用它们。

简而言之，有没有更简单的方法来计算具有python数据的列的平均值？

Answer 1

稍微概括一下，让我们从具有多个列的框架开始：

>>> import numpy as np
>>> from datatable import f, dt
>>> np.random.seed(1)
>>> foo = dt.Frame(x=np.random.randn(10), y=np.random.randn(10))
>>> foo
            x           y
--  ---------  ----------
 0   1.62435    1.46211  
 1  -0.611756  -2.06014  
 2  -0.528172  -0.322417 
 3  -1.07297   -0.384054 
 4   0.865408   1.13377  
 5  -2.30154   -1.09989  
 6   1.74481   -0.172428 
 7  -0.761207  -0.877858 
 8   0.319039   0.0422137
 9  -0.24937    0.582815 

[10 rows x 2 columns]

首先，简单的 .mean() 方法将 return 一个 1x2 帧，每列表示：

>>> foo.mean()
             x          y
--  ----------  ---------
 0  -0.0971409  -0.169588

[1 row x 2 columns]

如果你想要单列的平均值，你必须首先 select 来自 foo 的那一列：foo[:, f.y]，或 foo[:, 'y']，或简单地 [=18] =]:

>>> foo['y'].mean()
            y
--  ---------
 0  -0.169588

[1 row x 1 column]

现在，如果您想要一个数字而不是 1x1 帧，您可以使用 [0, 0] select 或者调用函数 .mean1()：

>>> foo['y'].mean()[0, 0]
-0.1695883821153589

>>> foo['y'].mean1()
-0.1695883821153589

将聚合函数应用于数据表列和 return 值，而不是数据表

Apply aggregate function to a datatable column and return value, not datatable

python

py-datatable