我如何构建一个空的 table 以最终保存一个时间戳列和两个浮点列，严格使用 numpy 并且不允许 pandas？

Question

我是 numpy 的初学者。我无法弄清楚（通过在线教程）如何制作具有 3 列的 table，第一个是时间图，另外两个是 float 类型，任何给定长度 nrows.

所有示例似乎都使用 pandas。我尝试了 pd.read_csv() 以正确格式调用数据并使用我的调试器查看其属性，但它没有 dtype 属性，而是 dtypes 并且它是一个长度为 7 的浮点数和整数元组，没有日期时间，7 与我传入的数据无关。

尝试：

import numpy as np

# Desired result:
# Type: datetime64, float, float
#       (a date)  ,  0.1,  0.2
#       (a date)  ,  0.3,  0.4
#        ...
#       nrows in length

nrows = 64
table = np.empty(shape=(nrows, 3), dtype=('datetime64', float, float))

print(table.dtype)

给出：

line 11, in <module>
  table = np.empty(shape=(nrows, 3), dtype=('datetime64', float, float))

builtins.TypeError: Tuple must have size 2, but has size 3

所以，我不确定 shape 和 dtype 应该如何相互关联。

我很确定我的形状是正确的，那么在这个用例中 dtype 的正确用法是什么？

我不想在这里使用 pandas，因为它在我的机器上运行非常慢，我很享受 numpy 的速度和它的 C 实现。

Answer 1

试试这个：

table = np.empty(shape=(nrows, 3), dtype=('datetime64,float,float'))

Answer 2

您想创建一个 structured array。有一个完整的文档页面。使 dtype 规范正确需要一些阅读。一种格式：

In [231]: table = np.zeros(shape=(4,), dtype='datetime64[ns],f,f')
In [232]: table
Out[232]: 
array([('1970-01-01T00:00:00.000000000', 0., 0.),
       ('1970-01-01T00:00:00.000000000', 0., 0.),
       ('1970-01-01T00:00:00.000000000', 0., 0.),
       ('1970-01-01T00:00:00.000000000', 0., 0.)],
      dtype=[('f0', '<M8[ns]'), ('f1', '<f4'), ('f2', '<f4')])

这是一个 (4,) 形状，有 3 个字段。不是 (4,3)。这种数组的数据是逐个字段提供的，或者作为一个元素的元组或元组列表提供。也就是说，输入数据需要具有与上述显示相同的布局。

相当于pandas：

In [234]: import pandas as pd
In [235]: df = pd.DataFrame(table)
In [236]: df
Out[236]: 
          f0   f1   f2
0 1970-01-01  0.0  0.0
1 1970-01-01  0.0  0.0
2 1970-01-01  0.0  0.0
3 1970-01-01  0.0  0.0
In [238]: df.dtypes
Out[238]: 
f0    datetime64[ns]
f1           float32
f2           float32

然后回到数组：

In [239]: df.to_records(index=False)
Out[239]: 
rec.array([('1970-01-01T00:00:00.000000000', 0., 0.),
           ('1970-01-01T00:00:00.000000000', 0., 0.),
           ('1970-01-01T00:00:00.000000000', 0., 0.),
           ('1970-01-01T00:00:00.000000000', 0., 0.)],
          dtype=[('f0', '<M8[ns]'), ('f1', '<f4'), ('f2', '<f4')])

np.genfromtxt 和 dtypes=None 可以以与 pd.read_csv 几乎相同的方式加载 csv 文件（尽管 pandas 读取通常更快）。

我如何构建一个空的 table 以最终保存一个时间戳列和两个浮点列，严格使用 numpy 并且不允许 pandas？

How can I build an empty table to eventually hold a timestamp column and two float columns, strictly using numpy and no pandas allowed?

python

numpy

time-series

python-3.x

numpy-ndarray