有没有办法对Python数据表中的整个帧进行算术运算？

Question

这个问题是关于最近的h2o数据表包的。我想用这个库替换 pandas 代码以提高性能。

问题很简单：我需要 divide/sum/multiply/substract 整个框架或多个选定的列。

在pandas中，要将除第一列以外的所有列除以3，可以这样写：

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "C0": np.random.randn(10000), 
    "C1": np.random.randn(10000)
})
df.iloc[:,1:] = df.iloc[:,1:]/3

在数据表包中，可以只对一个选定的列执行此操作：

import datatable as dt
from datatable import f

df = dt.Frame(np.random.randn(1000000))
df[:, "C1"] = dt.Frame(np.random.randn(1000000))
for i in range(1,df.shape[1]): df[:,i] = df[:,f[i]/3]

到目前为止，在 Python 3.6（我不知道 3.7 版本）中，FrameProxy f 不接受切片。我只是问是否有比循环更好的方法来执行这种帧算术运算，我在 Documentation.

上没有找到它

编辑：

最新提交 #1962 添加了与此问题相关的功能。如果我能够运行最新的源版本，我会为自己添加一个包含该新功能的答案。

Answer 1

你是对的，f-symbol 目前不支持切片表达式（顺便说一句，这是一个有趣的想法，也许将来可以添加？）

但是，赋值的 right-hand 端可以是表达式列表，允许您编写以下内容：

df = dt.Frame(C0=np.random.randn(1000000),
              C1=np.random.randn(1000000))

df[:, 1:] = [f[i]/3 for i in range(1, df.ncols)]

Answer 2

截至 2019 年 1 月，通过 pip 安装的 datatable 的 Python 3.6 和 3.7 版本都支持带有 f-expressions 的切片它是 documented。因此，解决方案很简单。

import datatable as dt
from datatable import f
import numpy as np

# generate some data to test
df = dt.Frame(C0=np.random.randn(1000000),
              C1=np.random.randn(1000000))

df[:, 1:] = df[:, f[1:]/3]

有没有办法对Python数据表中的整个帧进行算术运算？

Is there a way of performing arithmetic operations on entire Frame in Python datatable?

python

py-datatable