如何连续读取 Crystal 中的二进制文件并从中获取字节？

Question

读取 Crystal 中的二进制文件应该用 Bytes.new(size) 和 File#read 完成，但是...如果您不知道要读取多少字节怎么办提前，你想一次继续阅读块吗？

这是一个示例，从一个假想的文件格式中读取 3 个块，该文件格式指定具有初始字节的数据块的长度：

file = File.open "something.bin", "rb"

以下不起作用，因为 Bytes 无法连接（因为它实际上是 Slice(UInt8)，并且切片无法连接）：

data = Bytes.new(0)

3.times do
    bytes_to_read = file.read_byte.not_nil!
    chunk = Bytes.new(bytes_to_read)
    file.read(chunk)
    data += chunk
end

我想出的最好的办法是使用 Array(UInt8) 而不是 Bytes，并对读取的所有字节调用 to_a：

data = [] of UInt8

3.times do
    bytes_to_read = file.read_byte.not_nil!
    chunk = Bytes.new(bytes_to_read)
    file.read(chunk)
    data += chunk.to_a
end

然而，似乎没有办法将其变回 Bytes (Array#to_slice was removed), which is needed for many applications and recommended by the authors to be the type of all binary data.

那么...我如何继续从文件中读取、连接到先前数据的末尾，以及从中得到 Bytes?

Answer 1

一个解决方案是在每次迭代时将数据复制到调整大小的字节。您还可以将 Bytes 实例收集到一个容器（例如 Array）中并在最后合并它们，但这都意味着额外的复制操作。

最好的解决方案可能是使用一个足够大的缓冲区来容纳所有可能被读取的数据——或者至少很可能（必要时调整大小）。如果最大大小仅为 3 * 255 字节，则这是 no-brainer。如果缓冲区太大，你可以在最后缩小尺寸。

data = Bytes.new 3 * UInt8::MAX
bytes_read = 0
3.times do
  bytes_to_read = file.read_byte.not_nil!
  file.read_fully(data + bytes_read)
  bytes_read += bytes_to_read
end
# resize to actual size at the end:
data = data[0, bytes_read]

注意：由于数据格式说明要读取多少字节，因此您应该使用 read_fully 而不是 read，如果实际上要读取的字节较少，它会默默忽略。

编辑：由于事先不知道块的数量和最大大小（根据评论），您应该使用动态调整大小的缓冲区。这可以使用 IO::Memory 轻松实现，它将在必要时相应地调整缓冲区的大小。

io = IO::Memory.new
loop do
  bytes_to_read = file.read_byte
  break if bytes_to_read.nil?
  IO.copy(file, io, bytes_to_read)
end
data = io.to_slice

如何连续读取 Crystal 中的二进制文件并从中获取字节？

How to continuously read a binary file in Crystal and get Bytes out of it?

binaryfiles

crystal-lang