一个循环迭代影响另一个循环迭代的优雅方式是什么？

Question

我刚才需要处理一个配置文件。由于它的生成方式，它包含如下行：

---(more 15%)---

第一步是去除这些不需要的线条。作为一个轻微的扭曲，这些行中的每一行后面都有一个空行，我也想去掉它。我创建了一个快速 Python 脚本来执行此操作：

skip_next = False
for line in sys.stdin:
    if skip_next:
        skip_next = False
        continue    
    if line.startswith('---(more'):
        skip_next = True
        continue    
    print line,

现在，这行得通了，但它比我希望的更老套。难点在于，在逐行循环时，我们希望一行的内容影响到下一行。因此我的问题是：一个循环迭代影响另一个循环迭代的优雅方式是什么？

Answer 1

这种感觉很尴尬的原因是你从根本上做错了。 for 循环应该是对系列中每个元素的顺序迭代。如果你正在做一些调用 continue 的事情，甚至没有查看当前元素，基于系列的前一个元素中发生的事情，你就打破了那个基本的抽象。然后，您将引入笨拙的额外移动部件来处理您正在设置的方钉圆孔解决方案。

相反，请尝试使操作接近导致它的条件。我们知道 for 循环只是 while 循环的一种特殊情况的语法糖，所以让我们使用它。伪代码，因为我不熟悉 Python 的 I/O 子系统：

while not sys.stdin.eof: //or whatever
    line = sys.stdin.ReadLine()
    if line.startswith('---(more'):
        sys.stdin.ReadLine() //read the next line and ignore it
        continue    
    print line

Answer 2

另一种方法是使用 itertools.tee，它允许您将迭代器分成两个。然后您可以将一个迭代器前进一步，将一个迭代器放在另一个迭代器的前面一行。然后你可以压缩两个迭代器并在 for 循环的每一步查看前一行和当前行（我使用 izip_longest 所以它不会删除最后一行）：

from itertools import tee, izip_longest
in1, in2 = tee(sys.stdin, 2)
next(in2)
for line, prevline in izip_longest(in1, in2, fillvalue=''):
    if line.startswith('---(more') or prevline.startswith('---(more'):
        continue
    print line

这也可以作为等效的生成器表达式来完成：

from itertools import tee, izip_longest
in1, in2 = tee(sys.stdin, 2)
next(in2)
pairs = izip_longest(in1, in2, fillvalue='')
res = (line for line, prevline in pairs
       if not line.startswith('---(more') and not prevline.startswith('---(more'))
for line in res:
    print line

或者您可以使用 filter，它允许您在条件不成立时删除迭代器项。

from itertools import tee, izip_longest
in1, in2 = tee(sys.stdin, 2)
next(in2)
pairs = izip_longest(in1, in2, fillvalue='')
cond = lambda pair: not pair[0].startswith('---(more') and not pair[1].startswith('---(more')
res = filter(cond, pairs)
for line in res:
    print line

如果您愿意走出 python 标准库，toolz 包可以让这更容易。它提供了一个 sliding_window 函数，允许您将迭代器（例如 a b c d e f 拆分为 (a,b), (b,c), (c,d), (d,e), (e,f) 之类的东西。这与上面的 tee 方法基本相同，它只是将三行合并为一行：

from toolz.itertoolz import sliding_window
for line, prevline in sliding_wind(2, sys.stdin):
    if line.startswith('---(more') or prevline.startswith('---(more'):
        continue
    print line

您还可以使用 remove，这与 filter 基本相反，无需 for 循环即可删除项目：

from tools.itertoolz import sliding_window, remove
pairs = sliding_window(2, sys.stdin)
cond = lambda x: x[0].startswith('---(more') or x[1].startswith('---(more')
res = remove(cond, pairs)
for line in res:
    print line

Answer 3

在这种情况下，我们可以通过手动推进迭代器来跳过一行。这导致代码有点类似于 Mason Wheeler 的解决方案，但仍然使用迭代语法。有一个相关的 Stack Overflow question:

for line in sys.stdin:
    if line.startswith('---(more'):
        sys.stdin.next()
        continue    
    print line,

一个循环迭代影响另一个循环迭代的优雅方式是什么？

What's an elegant way for one loop iteration to affect another?

python

text-processing