将 Apache Arrow table 写入字符串 C++
Write Apache Arrow table to string C++
我正在尝试将 Apache Arrow table 写入字符串。我的大例子有问题,我无法让这个小例子工作。这一个在 WriteTable 调用中出现在 Arrow 内部的段错误。我更大的例子似乎没有正确序列化。
#include <arrow/api.h>
#include <arrow/io/memory.h>
#include <arrow/ipc/api.h>
std::shared_ptr<arrow::Table> makeSimpleFakeArrowTable() {
std::vector<std::shared_ptr<arrow::Field>> arrowFields;
arrowFields.emplace_back(std::make_shared<arrow::Field>("Field1", arrow::int64()));
arrowFields.emplace_back(std::make_shared<arrow::Field>("Field2", arrow::float64()));
auto schema = std::make_shared<arrow::Schema>(arrowFields);
std::vector<std::shared_ptr<arrow::Array>> columns(schema->num_fields());
arrow::Int64Builder longBuilder;
longBuilder.Append(20);
longBuilder.Finish(&(columns.at(0)));
arrow::DoubleBuilder doubleBuilder;
doubleBuilder.Append(10.0);
longBuilder.Finish(&(columns.at(1)));
return arrow::Table::Make(schema, columns);
}
std::shared_ptr<arrow::RecordBatch>
getArrowBatchFromBytes(const std::string& bytes) {
arrow::io::BufferReader arrowBufferReader{bytes};
auto streamReader =
arrow::ipc::RecordBatchStreamReader::Open(&arrowBufferReader).ValueOrDie();
auto batch = streamReader->Next().ValueOrDie();
return batch;
}
std::string arrowTableToByteString(const std::shared_ptr<arrow::Table>& table) {
auto stream = arrow::io::BufferOutputStream::Create().ValueOrDie();
auto batchWriter = arrow::ipc::MakeStreamWriter(stream, table->schema()).ValueOrDie();
auto status = batchWriter->WriteTable(*table);
if (not status.ok()) {
throw std::runtime_error(
"Couldn't write Arrow Table to byte string. Arrow status was: '" +
status.ToString() + "'.");
}
std::shared_ptr<arrow::Buffer> buffer = stream->Finish().ValueOrDie();
return buffer->ToHexString();
}
int main(int argc, char** argv) {
auto simpleFakeArrowTable = makeSimpleFakeArrowTable();
std::string tableAsByteString = arrowTableToByteString(simpleFakeArrowTable);
auto batch = getArrowBatchFromBytes(tableAsByteString);
assert(batch != nullptr);
}
我想到了两件事。首先,我认为这是一个错字:
longBuilder.Finish(&(columns.at(0)));
arrow::DoubleBuilder doubleBuilder;
doubleBuilder.Append(10.0);
longBuilder.Finish(&(columns.at(1))); // Shouldn't this be doubleBuilder?
每当您自己创建箭头 table 时,最好调用 arrow::Table::ValidateFull
。这将有助于发现这样的错误(在这种情况下,状态 returned 会报告输入数组与模式不匹配)。
其次,如果我们解决这个问题,我们会收到一个错误,因为您 return buffer->ToHexString();
会将您的字节数组转换为十六进制字符串(例如字节 [10, 20, 30]
成为字节 [48, 48, 48, 65, 48, 48, 49, 52, 48, 48, 49, 69]
,通常表示为 000A0014001E
).
然后您转身并尝试将这些十六进制字节读取为 table arrow::io::BufferReader arrowBufferReader{bytes};
。如果我将 ToHexString
更改为 ToString
,那么您的示例将运行并且 returns 0.
我正在尝试将 Apache Arrow table 写入字符串。我的大例子有问题,我无法让这个小例子工作。这一个在 WriteTable 调用中出现在 Arrow 内部的段错误。我更大的例子似乎没有正确序列化。
#include <arrow/api.h>
#include <arrow/io/memory.h>
#include <arrow/ipc/api.h>
std::shared_ptr<arrow::Table> makeSimpleFakeArrowTable() {
std::vector<std::shared_ptr<arrow::Field>> arrowFields;
arrowFields.emplace_back(std::make_shared<arrow::Field>("Field1", arrow::int64()));
arrowFields.emplace_back(std::make_shared<arrow::Field>("Field2", arrow::float64()));
auto schema = std::make_shared<arrow::Schema>(arrowFields);
std::vector<std::shared_ptr<arrow::Array>> columns(schema->num_fields());
arrow::Int64Builder longBuilder;
longBuilder.Append(20);
longBuilder.Finish(&(columns.at(0)));
arrow::DoubleBuilder doubleBuilder;
doubleBuilder.Append(10.0);
longBuilder.Finish(&(columns.at(1)));
return arrow::Table::Make(schema, columns);
}
std::shared_ptr<arrow::RecordBatch>
getArrowBatchFromBytes(const std::string& bytes) {
arrow::io::BufferReader arrowBufferReader{bytes};
auto streamReader =
arrow::ipc::RecordBatchStreamReader::Open(&arrowBufferReader).ValueOrDie();
auto batch = streamReader->Next().ValueOrDie();
return batch;
}
std::string arrowTableToByteString(const std::shared_ptr<arrow::Table>& table) {
auto stream = arrow::io::BufferOutputStream::Create().ValueOrDie();
auto batchWriter = arrow::ipc::MakeStreamWriter(stream, table->schema()).ValueOrDie();
auto status = batchWriter->WriteTable(*table);
if (not status.ok()) {
throw std::runtime_error(
"Couldn't write Arrow Table to byte string. Arrow status was: '" +
status.ToString() + "'.");
}
std::shared_ptr<arrow::Buffer> buffer = stream->Finish().ValueOrDie();
return buffer->ToHexString();
}
int main(int argc, char** argv) {
auto simpleFakeArrowTable = makeSimpleFakeArrowTable();
std::string tableAsByteString = arrowTableToByteString(simpleFakeArrowTable);
auto batch = getArrowBatchFromBytes(tableAsByteString);
assert(batch != nullptr);
}
我想到了两件事。首先,我认为这是一个错字:
longBuilder.Finish(&(columns.at(0)));
arrow::DoubleBuilder doubleBuilder;
doubleBuilder.Append(10.0);
longBuilder.Finish(&(columns.at(1))); // Shouldn't this be doubleBuilder?
每当您自己创建箭头 table 时,最好调用 arrow::Table::ValidateFull
。这将有助于发现这样的错误(在这种情况下,状态 returned 会报告输入数组与模式不匹配)。
其次,如果我们解决这个问题,我们会收到一个错误,因为您 return buffer->ToHexString();
会将您的字节数组转换为十六进制字符串(例如字节 [10, 20, 30]
成为字节 [48, 48, 48, 65, 48, 48, 49, 52, 48, 48, 49, 69]
,通常表示为 000A0014001E
).
然后您转身并尝试将这些十六进制字节读取为 table arrow::io::BufferReader arrowBufferReader{bytes};
。如果我将 ToHexString
更改为 ToString
,那么您的示例将运行并且 returns 0.