Python mmap.mmap() 到类似字节的对象？

Question

mmap says that "Memory-mapped file objects behave like both bytearray and like file objects 的文档。"

但是，这似乎没有扩展到标准 for 循环：至少对于我目前正在使用的 Linux 上的 Python 3.8.5，每个 mmap.mmap() 迭代器元素是单字节 bytes，而对于 bytearray 和普通文件访问，每个元素都是 int。 更新。更正：对于正常的文件访问，它是一个可变大小的bytes；见下文。

这是为什么？更重要的是，我怎样才能有效地从 mmap 中获取类似字节的对象，因此不仅索引而且 for 都给我一个 int？（高效的意思是我想避免额外的复制、转换等）

这是演示该行为的代码：

#!/usr/bin/env python3.8

def print_types(desc, x):
    for el in setmm: break   ### UPDATE: bug here, `setmm` should be `x`, see comments
    # `el` is now the first element of `x`
    print('%-30s: type is %-30s, first element is %s' % (desc,type(x),type(el)))
    try: print('%72s(first element size is %d)' % (' ', len(el)))
    except: pass # ignore failure if `el` doesn't support `len()`

setmm = bytearray(b'hoi!')
print_types('bytearray', setmm)

with open('set.mm', 'rb') as f:
    print_types('file object', f)

with open('set.mm', 'rb') as f:
    setmm = f.read()
    print_types('file open().read() result', setmm)

import mmap
with open('set.mm', 'rb') as f:
    setmm = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    print_types('file mmap.mmap() result', setmm)

这导致

bytearray                     : type is <class 'bytearray'>           , first element type is <class 'int'>
file object                   : type is <class '_io.BufferedReader'>  , first element type is <class 'int'>
file open().read() result     : type is <class 'bytes'>               , first element type is <class 'int'>
file mmap.mmap() result       : type is <class 'mmap.mmap'>           , first element type is <class 'bytes'>
                                                                        (first element size is 1)

更新。修复了furas在评论中友善指出的错误，结果变为

bytearray                     : type is <class 'bytearray'>           , first element is <class 'int'>
file object                   : type is <class '_io.BufferedReader'>  , first element is <class 'bytes'>
                                                                        (first element size is 38)
file open().read() result     : type is <class 'bytes'>               , first element is <class 'int'>
file mmap.mmap() result       : type is <class 'mmap.mmap'>           , first element is <class 'bytes'>
                                                                        (first element size is 1)

这回答了发生的事情：出于某种原因，遍历 mmap 就像遍历一个文件，每次都返回一个 bytes，但不像文件那样使用完整行，而是单个-字节块。

我的主要问题仍然没有改变：我怎样才能有效地让 mmap 表现得像一个类似字节的对象（即，索引和 for 都给出 int）？

Answer 1

How can I efficiently have an mmap behave like a bytes-like object (i.e., both indexing and for give int)?

bytes 是一个包含内存中数据的对象。但是 mmap 的重点是不要将所有数据加载到内存中。

如果要获取包含文件全部内容的 bytes 对象，open() 正常文件和 read() 全部内容。为此使用 mmap() 对你自己不利。

也许你想使用 memoryview，它可以从 bytes 或 mmap() 构造，并且会给你一个统一的 API.

Python mmap.mmap() 到类似字节的对象？

Python mmap.mmap() to bytes-like object?

python

iterator

mmap

python-3.x