为什么 Python 看不到文件中的所有行？

Question

我在以下方法中使用 Python 计算文件中的行数：

n = 0
for line in file('input.txt'):
   n += 1
print n

我运行这个脚本在Windows下。

然后我使用 Unix 命令计算同一文件中的行数：

wc -l input.txt

使用 Unix 命令计数会得到明显更多的行数。

所以，我的问题是：为什么 Python 没有看到文件中的所有行？还是定义的问题？

Answer 1

您的文件很可能包含一个或多个 DOS EOF (CTRL-Z) 字符，ASCII 代码点 0x1A。当 Windows 在文本模式下打开一个文件时，它仍然会遵循旧的 DOS 语义，并且 end 每当它读取该字符时文件。参见 Line reading chokes on 0x1A。

只有以二进制模式打开文件才能绕过这种行为。要这样做并仍然计算行数，您有两个选择：

分块读取，然后统计每个块中的行分隔符数量：

def bufcount(filename, linesep=os.linesep, buf_size=2 ** 15):
    lines = 0
    with open(filename, 'rb') as f:
        last = ''
        for buf in iter(f.read, ''):
            lines += buf.count(linesep)
            if last and last + buf[0] == linesep:
                # count line separators straddling a boundary
                lines += 1
            if len(linesep) > 1:
                last = buf[-1]
    return lines

考虑到 Windows os.linesep 设置为 \r\n，根据您的文件需要进行调整；在二进制模式下，行分隔符不会转换为 \n.

使用io.open()； io 文件对象集总是以二进制模式打开文件，然后自己进行翻译：
```
import io

with io.open(filename) as f:
    lines = sum(1 for line in f)
```

为什么 Python 看不到文件中的所有行？

Why Python does not see all the rows in a file?

python

input

line-breaks