同时遍历两个文件的行

Question

我正在尝试将两个文件之间的特定行连接在一起。这样我想将 file2 中第 2 行的内容添加到 file1 的第 2 行。然后从 file2 的第 6 行到文件 1 的第 6 行，依此类推。有没有办法同时遍历这两个文件来做到这一点？（了解每个输入文件大约 15GB 可能会有所帮助）。

这是一个简化的例子：

文件 1：

Ignore
This is a
Ignore
Ignore
Ignore
This is also a
Ignore
Ignore

文件 2：

Ignore
sentence
Ignore
Ignore
Ignore
sentence
Ignore
Ignore

输出文件：

Ignore
This is a sentence
Ignore
Ignore
Ignore
This is also a sentence
Ignore
Ignore

Answer 1

您可以使用 zip 内置函数一次循环遍历多个 iterables。

例子

x = y = [1, 2, 3]
for a, b in zip(x, y):
    print(a, b)

输出将如下所示：

1 1
2 2
3 3

同样的原则也适用于您的文件。

with open("/path/to/file-1") as file_1:
    with open("/path/to/file-2") as file_2:
        for line_1, line_2 in zip(file_1, file_2):
            print(a, b)

您的输出将是来自任一文件的匹配行的串联，由单个 space.

分隔

Answer 2

Python3:

with open('bigfile_1') as bf1:
    with open('bigfile_2') as bf2:
        for line1, line2 in zip(bf1, bf2):
            process(line1, line2)

重要的是，bf1 和 bf2 不会一次读入整个文件。它们是知道如何一次生成一行的迭代器。

zip() 可以很好地与迭代器一起工作，并且会自己生成一个交互器，在本例中是成对的行供您处理。

使用 with 确保文件将在之后关闭。

Python 2.x

import itertools

with open('bigfile_1') as bf1:
    with open('bigfile_2') as bf2:
        for line1, line2 in itertools.izip(bf1, bf2):
            process(line1, line2)

Python 2.x 不能以同样的方式使用 zip - 它会产生一个列表而不是一个可迭代的，用那些 15GB 的文件占用你所有的系统内存。我们需要使用一个特殊的可迭代版本的 zip。

同时遍历两个文件的行

Iterate over the lines of two files simultaneously

python

python-2.x