遍历文件中的行时进行循环？

Question

我有一个循环如下：

for line in FILE:
    if ('MyExpression' in line)
        # Pull the first number out of this line and put it in a list
        # Pull the first number out of the NEXT line that has either 'MyExpression' or      'MyExpression2', and put it in a list

基本上，我想找到 'My Expression exists' 所在的一行，然后从该行中拉出一个数字，表示试验开始。然后我想跳转到包含 MyExpression 或 MyExpression2 的下一行，并从该行中提取一个数字作为我的试验的偏移量。我想查看我的整个文件，所以我有两个列表，一个表示开始，一个表示偏移。

我知道如何在 Matlab 中执行此操作，但在 Python 中我不确定如何告诉它查看下一行。像 if ('MyExpresion' in line+1) OR ('MyExpression2' in line+1)?

更新：抱歉回复晚了，但我的文件可能是这样的：

1234 MyExpression Blah Blah
3452 Irrelevant Blah Blah
4675 MyExpression2 Blah Blah
5234 MyExpression Blah Blah
6666 MyExpression Blah Blah

我想要两个 arrays/lists：基本上是 [1234, 5234] 和 [4675, 6666]，它们对应于开始和偏移。我会玩当前的答案，看看是否有人这样做，谢谢！

Answer 1

文件对象是 iterators, which means that you can advance them with next:

for line in FILE:
    if ('MyExpression' in line):
        next_line = next(FILE, None)

请注意，如果到达文件末尾，None 中的默认值为 return。没有它，将引发 StopIteration 异常。

Answer 2

在for line in afile:循环体中，下一行还没有被读取；但是，您可以继续阅读所述循环体内的以下几行。例如：

for line in afile:
    if 'MyExpression' in line:
        # ...the number extraction, e.g with a regular expression, then:
        for nextline in afile:
            if 'MyExpression' in nextline or 'MyExpression2' in nextline:
                # the other number extraction, then
                break  # done with the inner loop

请注意，这 consumes 是 afile 中剩余的一部分（或全部）。如果您需要再次遍历该部分，则需要使用 itertools.tee 生成 afile 迭代器的两个 "clones"，然后在 "clones" 上循环。但是，根据我对你的问题的理解，这对于你的特定要求来说不是必需的（而且它有点棘手，所以我不会详细说明）。

例如，如果 a.txt 是您提供的示例文件：

1234 MyExpression Blah Blah
3452 Irrelevant Blah Blah
4675 MyExpression2 Blah Blah
5234 MyExpression Blah Blah
6666 MyExpression Blah Blah

然后这个示例代码：

with open('a.txt') as afile:
    results = []
    for line in afile:
        if 'MyExpression' in line:
            first = int(line.split()[0])
            for nextline in afile:
                if 'MyExpression' in nextline or 'MyExpression2' in nextline:
                    second = int(nextline.split()[0])
                    results.append([first, second])
                    break  # done with the inner loop
    print(results)

发出

[[1234, 4675], [5234, 6666]]

不知道你想象的算法是什么，相反，

[1234, 5234] and [4675, 6666]

什么逻辑规范会使 4675 被第一对忽略，但被重新考虑为第二对的开始？当然，在您的 Q 文本中我看不到任何具体说明，因此，请编辑该文本以使您的规格符合您的实际意图！

Answer 3

希望这有助于...查找 "Expression"，并成对打印行。

text = "Expression"

# Get lines with text in it
with open('test.log') as log_file:
    the_lines = [line.strip() for line in log_file if text in line]

# Make pairs (0,1), (2,3), etc.
duples = [(the_lines[2*i], the_lines[2*i+1]) for i in xrange(len(the_lines)/2)]

# Show me...
for pair in duples:
    print pair

您应该将 line.strip() 替换为您自己的函数以获得您要查找的号码。

注意：我不喜欢在创建双元时使用索引，但它比使用迭代器更简单。

遍历文件中的行时进行循环？

For loop when iterating over lines in a file?

python

file