使用 Apache Arrow 从 std::vector<unsigned char> 读取 CSV

Read CSV from std::vector<unsigned char> using Apache Arrow

我正在尝试使用 Apache arrow 读取 csv 输入格式。示例 here mentions that the input should be an InputStream, however in my case I just have an std::vector of unsigned chars. Is it possible to parse this using apache arrow? I have checked the I/O interface 查看是否存在“内存中”数据结构,运气不佳。 为了方便起见,我在这里复制粘贴了示例代码以及我的输入数据:

#include "arrow/csv/api.h"

{
   // ...
   std::vector<unsigned char> data;
   arrow::io::IOContext io_context = arrow::io::default_io_context();
   // how can I fit the std::vector to the input stream? 
   std::shared_ptr<arrow::io::InputStream> input = ...;

   auto read_options = arrow::csv::ReadOptions::Defaults();
   auto parse_options = arrow::csv::ParseOptions::Defaults();
   auto convert_options = arrow::csv::ConvertOptions::Defaults();

   // Instantiate TableReader from input stream and options
   auto maybe_reader =
     arrow::csv::TableReader::Make(io_context,
                                   input,
                                   read_options,
                                   parse_options,
                                   convert_options);
   if (!maybe_reader.ok()) {
      // Handle TableReader instantiation error...
   }
   std::shared_ptr<arrow::csv::TableReader> reader = *maybe_reader;

   // Read table from CSV file
   auto maybe_table = reader->Read();
   if (!maybe_table.ok()) {
      // Handle CSV read error
      // (for example a CSV syntax error or failed type conversion)
   }
   std::shared_ptr<arrow::Table> table = *maybe_table;
}

如有任何帮助,我们将不胜感激!

I/O 界面文档列表 BufferReader which works as an in-memory input stream. While not listed in the docs, it can be constructed from a pointer and a size 应该可以让您使用 vector<char>