文件处理中的文件长度

Question

我尝试以不同的方式在 python 中查找文本文件的长度。但我怀疑他们为什么会显示这样的输出。

文本文件：

hello 
this is a sample text file.
say hi to python

第一次尝试：

size_to_read = 8

with open('sample.txt', "r") as f:
    f_contents = f.read(size_to_read)
    print(f'the total length of file is {len(f_contents)}')
    while len(f_contents) > 0:
        print(f_contents, end="**")
        f_contents = f.read(size_to_read)

输出：

第二次尝试：

size_to_read = 8

with open('sample.txt', "r") as f:
    f_contents = f.read(size_to_read)
    print(f'the total length of file is {len(f.read())}')
    while len(f_contents) > 0:
        print(f_contents, end="**")
        f_contents = f.read(size_to_read)

输出：

第三次尝试：

size_to_read = 8

with open('sample.txt', "r") as f:
    f_contents = f.read(size_to_read)
    print(f'the total length of file is {len(f.readline())}')
    while len(f_contents) > 0:
        print(f_contents, end="**")
        f_contents = f.read(size_to_read)

输出：

任何人都可以解释为什么这 3 个给出不同的输出。

Answer 1

首先尝试这部分代码 f_contents = f.read(size_to_read) 确实读取了 8 个字节，然后您只是在此处打印您的变量 print(f'the total length of file is {len(f_contents)}')。在每次迭代中，您的代码一次读取 8 个字节。

在第二次尝试中，您也通过此代码 f_contents = f.read(size_to_read) 读取了 8 个字节，但通过此代码 print(f'the total length of file is {len(f.read())}') 打印给定文件中的总字节数（从头到尾完整读取文件）和然后您将继续从第 4 行 f_contents = f.read(size_to_read) 中的最后一个读取字节开始读取文件。

在第三次尝试中，您也通过此代码 f_contents = f.read(size_to_read) 读取了 8 个字节，但在这部分代码中 print(f'the total length of file is {len(f.readline())}') 您正在读取接下来的 8 个字节（从之前读取的 8 个字节开始）。因此，在这部分代码中，print(f_contents, end="**") print 语句打印从文件中读取并保存在 f_contens 变量中的前 8 个字节，然后在最后一行 f_contents = f.read(size_to_read) 您正在读取 8 个字节，而不是开始来自前 8 个字节 f_content 变量，但由于 print(f'the total length of file is {len(f.readline())}').

而来自后 8 个字节

Answer 2

为了不被混淆，想想两件事：

首先，你现在在文件中的什么位置？（把它想象成鼠标的光标）。第二，您正在阅读多少数据。

with read() 没有参数，你从你所在的地方读取数据，直到文件末尾。 with argument，就是按照指定的参数，以字节为单位读取数据。
用readline()你读了一行“从你现在所在的地方”，直到行尾。

不管你读了多少数据，你的位置就在那里，所以接下来的阅读过程从那里开始。

希望能解决您的困惑。

Answer 3

是读缓冲区的问题

第一种情况： 最初您读取 8 个字节，然后每次 while 循环迭代读取 8 个字节。这就是为什么它每次打印 8 个字节。

f_contents = f.read(size_to_read)  # read 8 bytes

注意：您在打印 len 时没有使用 read 函数。

第二种情况 最初你读取 8 个字节，但在打印长度时你使用读取函数，默认情况下读取到结束。所以，在 while 循环中它没有得到任何字节。（读取缓冲区已结束）

f_contents = f.read(size_to_read)  # reading 8 bytes
print(f'the total length of file is {len(f.read())}') # read rest of the file

第三种情况 最初您读取 8 个字节，但是在打印长度时您使用了 readline，它依次读取完整的行和到达最后一行的缓冲区，并且在循环中它会按照循环中定义的每次迭代打印 8 个字节。

f_contents = f.read(size_to_read)  # read 8 bytes
print(f'the total length of file is {len(f.readline())}')  # read 1 line

为了更好地理解，请使用 f.tell() 获取缓冲区的当前位置。参见下面案例 1 的示例：

size_to_read = 8

with open('sample.txt', "r") as f:
    f_contents = f.read(size_to_read)
    print(f'the total length of file is {len(f_contents)}')
    while len(f_contents) > 0:
        print(f_contents, end="**")
        print("\nbytes reached -- {}".format(f.tell())) #current state here
        f_contents = f.read(size_to_read)

文件处理中的文件长度

Length of file in file handling

python

file-handling