将包含字符串列的 pandas DataFrame 传递给 kdb+(使用 qPython API)
Passing pandas DataFrame containing string column to kdb+ (using qPython API)
exxeleron/qPython
模块允许将 pandas DataFrame
发送到 kdb+/q
的 table.
让我们准备数据:
import pandas.io.data as web
import datetime
import numpy
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f = web.DataReader(["F","MSFT"], 'yahoo', start, end) # download stock data from Yahoo Finance
f = f.to_frame().reset_index() # flatten the MultiIndex to have a sym column, see below
f = f[["Date","minor","Close"]]
f.columns = ["dt","sym","val"] # just give comfortable names
要传递的 DataFrame
对象如下所示:
f.head()
# Out:
# dt sym val
# 0 2010-01-04 F 10.28
# 1 2010-01-04 MSFT 30.95
# 2 2010-01-05 F 10.96
# 3 2010-01-05 MSFT 30.96
# 4 2010-01-06 F 11.37
f.dtypes
# Out:
# dt datetime64[ns]
# sym object
# val float64
当我尝试将其发送到 kdb+/q
时,出现以下错误:
import qpython.qconnection as qconnection
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True)
q.open()
q('set', numpy.string_('tbl'), f)
# File "G:\Anaconda\lib\site-packages\qpython\_pandas.py", line 159, in _write_pandas_series
# data = data.fillna(QNULLMAP[-abs(qtype)][1])
# KeyError: -10
DataFrame
中的 sym
列对于 qpython
是不明确的,它无法正确确定默认序列化。在这种情况下,您必须通过设置 meta
属性为列转换提供类型提示:
from qpython import MetaData
from qpython.qtype import QSYMBOL_LIST
f.meta = MetaData(sym = QSYMBOL_LIST)
q('set', numpy.string_('tbl'), f)
这指示 qpython
将 sym
列序列化为 q 符号列表:
q)meta tbl
c | t f a
---| -----
dt | p
sym| s
val| f
q)tbl
dt sym val
----------------------------------------
2010.01.04D00:00:00.000000000 F 10.28
2010.01.04D00:00:00.000000000 MSFT 30.95
2010.01.05D00:00:00.000000000 F 10.96
..
或者,您可以将 sym
列表示为包含字符串的通用列表。您还可以将类型转换应用于其他列:
from qpython import MetaData
from qpython.qtype import QSTRING_LIST, QINT_LIST, QDATETIME_LIST
f.meta = MetaData(sym = QSTRING_LIST, val = QINT_LIST, dt = QDATETIME_LIST)
q('set', numpy.string_('tbl'), f)
这导致:
q)meta tbl
c | t f a
---| -----
dt | z
sym|
val| i
q)tbl
dt sym val
----------------------------------
2010.01.04T00:00:00.000 "F" 10
2010.01.04T00:00:00.000 "MSFT" 30
..
exxeleron/qPython
模块允许将 pandas DataFrame
发送到 kdb+/q
的 table.
让我们准备数据:
import pandas.io.data as web
import datetime
import numpy
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f = web.DataReader(["F","MSFT"], 'yahoo', start, end) # download stock data from Yahoo Finance
f = f.to_frame().reset_index() # flatten the MultiIndex to have a sym column, see below
f = f[["Date","minor","Close"]]
f.columns = ["dt","sym","val"] # just give comfortable names
要传递的 DataFrame
对象如下所示:
f.head()
# Out:
# dt sym val
# 0 2010-01-04 F 10.28
# 1 2010-01-04 MSFT 30.95
# 2 2010-01-05 F 10.96
# 3 2010-01-05 MSFT 30.96
# 4 2010-01-06 F 11.37
f.dtypes
# Out:
# dt datetime64[ns]
# sym object
# val float64
当我尝试将其发送到 kdb+/q
时,出现以下错误:
import qpython.qconnection as qconnection
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True)
q.open()
q('set', numpy.string_('tbl'), f)
# File "G:\Anaconda\lib\site-packages\qpython\_pandas.py", line 159, in _write_pandas_series
# data = data.fillna(QNULLMAP[-abs(qtype)][1])
# KeyError: -10
DataFrame
中的 sym
列对于 qpython
是不明确的,它无法正确确定默认序列化。在这种情况下,您必须通过设置 meta
属性为列转换提供类型提示:
from qpython import MetaData
from qpython.qtype import QSYMBOL_LIST
f.meta = MetaData(sym = QSYMBOL_LIST)
q('set', numpy.string_('tbl'), f)
这指示 qpython
将 sym
列序列化为 q 符号列表:
q)meta tbl
c | t f a
---| -----
dt | p
sym| s
val| f
q)tbl
dt sym val
----------------------------------------
2010.01.04D00:00:00.000000000 F 10.28
2010.01.04D00:00:00.000000000 MSFT 30.95
2010.01.05D00:00:00.000000000 F 10.96
..
或者,您可以将 sym
列表示为包含字符串的通用列表。您还可以将类型转换应用于其他列:
from qpython import MetaData
from qpython.qtype import QSTRING_LIST, QINT_LIST, QDATETIME_LIST
f.meta = MetaData(sym = QSTRING_LIST, val = QINT_LIST, dt = QDATETIME_LIST)
q('set', numpy.string_('tbl'), f)
这导致:
q)meta tbl
c | t f a
---| -----
dt | z
sym|
val| i
q)tbl
dt sym val
----------------------------------
2010.01.04T00:00:00.000 "F" 10
2010.01.04T00:00:00.000 "MSFT" 30
..