Python 3: gzip.open() 和模式

Python 3: gzip.open() and modes

https://docs.python.org/3/library/gzip.html

我正在考虑使用gzip.open(),我对mode参数有点困惑:

The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt', or 'xt' for text mode. The default is 'rb'.

那么'w''wb'有什么区别呢?

文档指出它们都是二进制模式

那是不是说'w''wb'没有区别?

这意味着r默认为rb,如果你想要文本,你必须使用rt来指定它。

(与 open 行为相反,其中 r 表示 rt,而不是 rb

正如你所说,@

已经涵盖了

Jean-Fran法布尔回答。
我只是想展示一些代码,因为它很有趣。
让我们看一下 python 库中的 gzip.py 源代码,看看实际上发生了什么。
gzip.open() 可以在这里 https://github.com/python/cpython/blob/master/Lib/gzip.py 找到,我在下面报告

def open(filename, mode="rb", compresslevel=9,
         encoding=None, errors=None, newline=None):
    """Open a gzip-compressed file in binary or text mode.
    The filename argument can be an actual filename (a str or bytes object), or
    an existing file object to read from or write to.
    The mode argument can be "r", "rb", "w", "wb", "x", "xb", "a" or "ab" for
    binary mode, or "rt", "wt", "xt" or "at" for text mode. The default mode is
    "rb", and the default compresslevel is 9.
    For binary mode, this function is equivalent to the GzipFile constructor:
    GzipFile(filename, mode, compresslevel). In this case, the encoding, errors
    and newline arguments must not be provided.
    For text mode, a GzipFile object is created, and wrapped in an
    io.TextIOWrapper instance with the specified encoding, error handling
    behavior, and line ending(s).
    """
    if "t" in mode:
        if "b" in mode:
            raise ValueError("Invalid mode: %r" % (mode,))
    else:
        if encoding is not None:
            raise ValueError("Argument 'encoding' not supported in binary mode")
        if errors is not None:
            raise ValueError("Argument 'errors' not supported in binary mode")
        if newline is not None:
            raise ValueError("Argument 'newline' not supported in binary mode")

    gz_mode = mode.replace("t", "")
    if isinstance(filename, (str, bytes, os.PathLike)):
        binary_file = GzipFile(filename, gz_mode, compresslevel)
    elif hasattr(filename, "read") or hasattr(filename, "write"):
        binary_file = GzipFile(None, gz_mode, compresslevel, filename)
    else:
        raise TypeError("filename must be a str or bytes object, or a file")

    if "t" in mode:
        return io.TextIOWrapper(binary_file, encoding, errors, newline)
    else:
        return binary_file  

我们注意到的几件事:

  • 默认模式是 rb,如您报告的文档所述
  • 打开一个二进制文件,不关心是不是"r", "rb", "w", "wb"例如
    我们可以在以下几行中看到这一点:

    gz_mode = mode.replace("t", "")
    if isinstance(filename, (str, bytes, os.PathLike)):
        binary_file = GzipFile(filename, gz_mode, compresslevel)
    elif hasattr(filename, "read") or hasattr(filename, "write"):
        binary_file = GzipFile(None, gz_mode, compresslevel, filename)
    else:
        raise TypeError("filename must be a str or bytes object, or a file")
    
    if "t" in mode:
        return io.TextIOWrapper(binary_file, encoding, errors, newline)
    else:
        return binary_file
    

    基本上,二进制文件 binary_file 会被构建,无论是否有额外的 b,因为 gz_mode 此时可以有或没有 b
    现在调用 class class GzipFile(_compression.BaseStream) 构建 binary_file.

在构造函数中,以下几行很重要:

 if mode and ('t' in mode or 'U' in mode):
        raise ValueError("Invalid mode: {!r}".format(mode))
    if mode and 'b' not in mode:
        mode += 'b'
    if fileobj is None:
        fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
    if filename is None:
        filename = getattr(fileobj, 'name', '')
        if not isinstance(filename, (str, bytes)):
            filename = ''
    else:
        filename = os.fspath(filename)
    if mode is None:
        mode = getattr(fileobj, 'mode', 'rb')

可以清楚地看到,如果 'b' 不存在于模式中,它将被添加

if mode and 'b' not in mode:
            mode += 'b'  

所以如前所述,这两种模式之间没有区别。