pandas DataFrame 在传递给 kdb+ 时删除索引(使用 qPython API)

pandas DataFrame drops index when passing to kdb+ (using qPython API)

我正在尝试将时间序列数据从 Python 传递到 q/kdb+

一种解决方案是 qPython module,提供从 q table/dictionary 到 Pandas 的无缝转换。

问题是当试图将 Pandas 传递到 q 时,DataFrame 中的时间索引(在列 Date) 并没有完全到达 q 一边。可重现代码:

import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5]  # explore first 5 rows of the DataFrame
# Out:
#             Open  High  Low  Close    Volume  Adj Close
#    Date
# 2010-01-04 10.17 10.28 10.05 10.28  60855800       9.43 
# 2010-01-05 10.45 11.24 10.40 10.96 215620200      10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600      10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700      10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000      10.72

q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server 
# Out:
#    Open  High  Low  Close    Volume  Adj Close
# 0 10.17 10.28 10.05 10.28  60855800       9.43 
# 1 10.45 11.24 10.40 10.96 215620200      10.05
# 2 11.21 11.46 11.13 11.37 200070600      10.43
# 3 11.46 11.69 11.32 11.66 130201700      10.69
# 4 11.67 11.74 11.46 11.69 130463000      10.72

如您所见,q table 没有 f DataFrame 中的 Date 列作为索引。

如何有效地(对于大数据)将日期时间索引传递给 q?

您是否考虑过尝试其他 API 之一? http://www.timestored.com/kdb-guides/python-api 我也会在他们的 github 页面上报告这个问题。

在序列化 DataFrame 对象时,qPython 检查是否存在 meta 属性。如果属性不存在,DataFrame 被序列化为 q table 并且在此过程中索引列被 skipped。如果你想 保留 索引列,你必须设置 meta 属性并提供类型提示以强制表示 q 键控 table.

请看一下修改后的样本:

import pandas.io.data as web
import datetime
import numpy
import qpython.qconnection as qconnection # requires installation of qPython module from https://github.com/exxeleron/qPython

from qpython import MetaData
from qpython.qtype import QKEYED_TABLE


start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f=web.DataReader("F", 'yahoo', start, end) # download Ford stock data (ticker "F") from Yahoo Finance web service
f.ix[:5]  # explore first 5 rows of the DataFrame
# Out:
#             Open  High  Low  Close    Volume  Adj Close
#    Date
# 2010-01-04 10.17 10.28 10.05 10.28  60855800       9.43 
# 2010-01-05 10.45 11.24 10.40 10.96 215620200      10.05
# 2010-01-06 11.21 11.46 11.13 11.37 200070600      10.43
# 2010-01-07 11.46 11.69 11.32 11.66 130201700      10.69
# 2010-01-08 11.67 11.74 11.46 11.69 130463000      10.72

q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True) # define connection interface parameters. Assumes we have previously started q server on port 5000 with `q.exe -p 5000` command
q.open() # open connection
f.meta = MetaData(**{'qtype': QKEYED_TABLE}) # enforce to serialize DataFrame as keyed table
q('set', numpy.string_('yahoo'), f) # pass DataFrame to q table named `yahoo`
q('5#yahoo') # display top 5 rows from newly created table on q server 
# Out:
#              Open   High    Low  Close     Volume  Adj Close
# Date                                                         
# 2010-01-04  10.17  10.28  10.05  10.28   60855800       9.43
# 2010-01-05  10.45  11.24  10.40  10.96  215620200      10.05
# 2010-01-06  11.21  11.46  11.13  11.37  200070600      10.43
# 2010-01-07  11.46  11.69  11.32  11.66  130201700      10.69
# 2010-01-08  11.67  11.74  11.46  11.69  130463000      10.72