将包含字符串列的 pandas DataFrame 传递给 kdb+(使用 qPython API)

Passing pandas DataFrame containing string column to kdb+ (using qPython API)

exxeleron/qPython 模块允许将 pandas DataFrame 发送到 kdb+/q 的 table.

让我们准备数据:

import pandas.io.data as web
import datetime
import numpy

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f = web.DataReader(["F","MSFT"], 'yahoo', start, end) # download stock data from Yahoo Finance
f = f.to_frame().reset_index() # flatten the MultiIndex to have a sym column, see below
f = f[["Date","minor","Close"]]
f.columns = ["dt","sym","val"] # just give comfortable names

要传递的 DataFrame 对象如下所示:

f.head()
# Out: 
#           dt   sym    val
# 0 2010-01-04     F  10.28
# 1 2010-01-04  MSFT  30.95
# 2 2010-01-05     F  10.96
# 3 2010-01-05  MSFT  30.96
# 4 2010-01-06     F  11.37

f.dtypes
# Out: 
# dt     datetime64[ns]
# sym            object
# val           float64    

当我尝试将其发送到 kdb+/q 时,出现以下错误:

import qpython.qconnection as qconnection
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True)
q.open()
q('set', numpy.string_('tbl'), f)

# File "G:\Anaconda\lib\site-packages\qpython\_pandas.py", line 159, in _write_pandas_series
#   data = data.fillna(QNULLMAP[-abs(qtype)][1])
# KeyError: -10

DataFrame 中的 sym 列对于 qpython 是不明确的,它无法正确确定默认序列化。在这种情况下,您必须通过设置 meta 属性为列转换提供类型提示:

from qpython import MetaData 
from qpython.qtype import QSYMBOL_LIST

f.meta = MetaData(sym = QSYMBOL_LIST)
q('set', numpy.string_('tbl'), f)

这指示 qpythonsym 列序列化为 q 符号列表:

q)meta tbl                              
c  | t f a                              
---| -----                              
dt | p                                  
sym| s                                  
val| f 

q)tbl                                   
dt                            sym  val  
----------------------------------------
2010.01.04D00:00:00.000000000 F    10.28
2010.01.04D00:00:00.000000000 MSFT 30.95
2010.01.05D00:00:00.000000000 F    10.96
..

或者,您可以将 sym 列表示为包含字符串的通用列表。您还可以将类型转换应用于其他列:

from qpython import MetaData 
from qpython.qtype import QSTRING_LIST, QINT_LIST, QDATETIME_LIST

f.meta = MetaData(sym = QSTRING_LIST, val = QINT_LIST, dt = QDATETIME_LIST)
q('set', numpy.string_('tbl'), f)

这导致:

q)meta tbl                          
c  | t f a                          
---| -----                          
dt | z                              
sym|                                
val| i  

q)tbl                               
dt                      sym    val  
----------------------------------  
2010.01.04T00:00:00.000 "F"    10   
2010.01.04T00:00:00.000 "MSFT" 30   
..