如何从 pyarrow 缓冲区反序列化 RecordBatch
How to deserialize a RecordBatch from a pyarrow Buffer
我的目标是序列化 RecordBatch
,将其发送到 websocket 通道
并在接收方对其进行反序列化。
接收端,接收到数据包并重构后
pyarrow.lib.Buffer
对象 pa.py_buffer
,我是
无法将其反序列化回 RecordBatch
.
远离 websocket 的样板,这是一个总结了我正在尝试做的事情的片段:
import pyarrow as pa
indicators = [(1, 'A'), (2, 'B')]
id = pa.int16()
name = pa.string()
data = pa.array(indicators, type=pa.struct([('id', id), ('name', name)]))
batch = pa.RecordBatch.from_arrays([data], ['indicators'])
buffer = batch.serialize()
# How to get back a RecordBatch from buffer?
#
# ???
当使用这样的 serialize
方法时,您可以使用 read_record_batch
函数 给定 一个已知的模式:
>>> pa.ipc.read_record_batch(buffer, batch.schema)
<pyarrow.lib.RecordBatch at 0x7ff412257278>
但这意味着您需要了解接收端的架构。要将其封装在序列化数据中,请改用 RecordBatchStreamWriter
:
>>> sink = pa.BufferOutputStream()
>>> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
>>> writer.write_batch(batch)
>>> writer.close()
>>> buf = sink.getvalue()
>>> reader = pa.ipc.open_stream(buf)
>>> reader.read_all()
pyarrow.Table
indicators: struct<id: int16, name: string>
child 0, id: int16
child 1, name: string
中的文档
我的目标是序列化 RecordBatch
,将其发送到 websocket 通道
并在接收方对其进行反序列化。
接收端,接收到数据包并重构后
pyarrow.lib.Buffer
对象 pa.py_buffer
,我是
无法将其反序列化回 RecordBatch
.
远离 websocket 的样板,这是一个总结了我正在尝试做的事情的片段:
import pyarrow as pa
indicators = [(1, 'A'), (2, 'B')]
id = pa.int16()
name = pa.string()
data = pa.array(indicators, type=pa.struct([('id', id), ('name', name)]))
batch = pa.RecordBatch.from_arrays([data], ['indicators'])
buffer = batch.serialize()
# How to get back a RecordBatch from buffer?
#
# ???
当使用这样的 serialize
方法时,您可以使用 read_record_batch
函数 给定 一个已知的模式:
>>> pa.ipc.read_record_batch(buffer, batch.schema)
<pyarrow.lib.RecordBatch at 0x7ff412257278>
但这意味着您需要了解接收端的架构。要将其封装在序列化数据中,请改用 RecordBatchStreamWriter
:
>>> sink = pa.BufferOutputStream()
>>> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
>>> writer.write_batch(batch)
>>> writer.close()
>>> buf = sink.getvalue()
>>> reader = pa.ipc.open_stream(buf)
>>> reader.read_all()
pyarrow.Table
indicators: struct<id: int16, name: string>
child 0, id: int16
child 1, name: string
中的文档