Return 结果<Cow<[u8]>> 而不是结果<Vec<u8>>

Question

我想将此功能更改为 return a Result<Cow<[u8]>>，以避免不必要地将整个文件复制到内存中。

pub(crate) fn get_reader_bytes<R: Read + MmapBytesReader>(reader: &mut R) -> Result<Vec<u8>> {
    // we have a file so we can mmap
    if let Some(file) = reader.to_file() {
        let mmap = unsafe { memmap::Mmap::map(file)? };
        Ok(mmap[..].to_vec())
    } else {
        // we can get the bytes for free
        if let Some(bytes) = reader.to_bytes() {
            Ok(bytes.to_vec())
        } else {
            // we have to read to an owned buffer to get the bytes.
            let mut bytes = Vec::with_capacity(1024 * 128);
            reader.read_to_end(&mut bytes)?;
            if !bytes.is_empty()
                && (bytes[bytes.len() - 1] != b'\n' || bytes[bytes.len() - 1] != b'\r')
            {
                bytes.push(b'\n')
            }
            Ok(bytes)
        }
    }
}

我尝试了下面的代码，但得到了 error[E0515]: cannot return value referencing local data *mmap，这是有道理的。

pub(crate) fn get_reader_bytes<R: Read + MmapBytesReader>(reader: &mut R) -> Result<Cow<[u8]>>     {
      // we have a file so we can mmap
      if let Some(file) = reader.to_file() {
          let mmap = unsafe { memmap::Mmap::map(file)? };
          Ok(Cow::Borrowed(&mmap[..]))
      } else {
          // we can get the bytes for free
          if let Some(bytes) = (*reader).to_bytes() {
              Ok(Cow::Borrowed(bytes))
          } else {
              // we have to read to an owned buffer to get the bytes.
              let mut bytes = Vec::with_capacity(1024 * 128);
              reader.read_to_end(&mut bytes)?;
              if !bytes.is_empty()
                  && (bytes[bytes.len() - 1] != b'\n' || bytes[bytes.len() - 1] != b'\r')
              {
                  bytes.push(b'\n')
              }
              Ok(Cow::Owned(bytes))
          }
      }
  }

我不确定如何继续，我是否需要在调用函数之前创建 mmap 并将其作为可变引用传递？还是可变引用的选项？还是选项的可变引用？

Answer 1

      if let Some(file) = reader.to_file() {
          let mmap = unsafe { memmap::Mmap::map(file)? };
          Ok(Cow::Borrowed(&mmap[..]))

问题从这里开始：您正在创建一个新的Mmap，负责内存映射的存在，作为局部变量。因此，Mmap 在函数末尾被删除，引用无效，因为内存映射不再存在。

您可以做的最接近此的事情是 return Mmap 本身——或者更确切地说，在这种情况下，return 你自己设计的枚举，而不是 Cow，它可以提供逻辑来借用它的任何变体（就像 Cow 所做的那样）：

enum MyBytes<'a> {
    Borrowed(&'a [u8]),
    Owned(Vec<u8>),
    Mapped(memmap::Mmap),
}

impl std::ops::Deref for MyBytes<'_> {
    type Target = [u8];
    fn deref(&self) -> &[u8] {
        match self {
            Self::Borrowed(ref_bytes) => ref_bytes,
            Self::Owned(vec) => &vec,
            Self::Mapped(mmap) => &mmap,
        }
    }
}

然后在 get_reader_bytes:

中使用那个枚举

pub(crate) fn get_reader_bytes<R: Read + MmapBytesReader>(reader: &mut R) -> Result<MyBytes<'_>, std::io::Error>     {
      // we have a file so we can mmap
      if let Some(file) = reader.to_file() {
          let mmap = unsafe { memmap::Mmap::map(file)? };
          Ok(MyBytes::Mapped(mmap))
      } else {
          // we can get the bytes for free
          if reader.to_bytes().is_some() {
              Ok(MyBytes::Borrowed(reader.to_bytes().unwrap()))
          } else {
              // we have to read to an owned buffer to get the bytes.
              let mut bytes = Vec::with_capacity(1024 * 128);
              reader.read_to_end(&mut bytes)?;
              if !bytes.is_empty()
                  && (bytes[bytes.len() - 1] != b'\n' || bytes[bytes.len() - 1] != b'\r')
              {
                  bytes.push(b'\n')
              }
              Ok(MyBytes::Owned(bytes))
          }
      }
}

请注意 Borrowed 案例中的尴尬：我不得不调用 to_bytes() 两次。这是因为借用检查器当前不支持您使用 mutable 引用执行某些操作的模式，然后 return 依赖于它的借用或删除借用并执行别的东西——它假定借用无条件地扩展到函数的末尾，阻止你对可变引用做任何其他事情。因此，在这种情况下，我们必须将字节切片是否可用的检查与 returning 这样的切片的操作分开。

一个更好的解决方案是将读取逻辑放在 MmapBytesReader 特性实现中（可能以 impl 调用的单独函数的形式，因此多个 impl 可以共享代码），因此根本没有 if 分支来混淆借用检查器。

Return 结果<Cow<[u8]>> 而不是结果<Vec<u8>>

Return Result<Cow<[u8]>> instead of Result<Vec<u8>>

file-io

rust