需要提前阅读一行而不是一次阅读两行 (Python)

Question

我正在编写 python 代码，它逐行读取文本文件并打印该行和下一行（如果该行以“>”开头且下一行以“”开头） G”。为了说明，我想要以下输入文件...

>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
>mm10_sample_name_here
AATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
>mm10_sample_name_here
>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG

输出为...

>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG
>mm10_sample_name_here
GATCGATGCTGCTAGTAGCATG

我试过在下面使用 next()...

original_file = 'test_input_file.txt'
file_destination = 'test_output_file.txt'

import os
if os.path.exists(file_destination):
  os.remove(file_destination)

f=open(original_file, 'r+')

for line in f:
  try:
    line2 = next(f)
  except StopIteration:
    line2 = ""
  if line2.startswith("G") and line.startswith(">"):
    with open(file_destination, "a") as myfile:
       myfile.write(line)
       myfile.write(line2)

但是，它一次读取输入文件两行，这意味着一旦一行不符合 if 条件，所有其他行都不匹配。在这方面的任何帮助都会很棒。谢谢

Answer 1

如您所知，您的解决方案不起作用。您在每次迭代中将生成器推进两个项目（因为您调用了 next()）。您需要使用一种策略来仅推进一次生成器。一种是在循环时保持状态，例如

previous_line = ""
for line in f:
  if line.startswith("G") and previous_line.startswith(">"):
    ...
  previous_line = line

您也可以保留 next() 函数并使用例如while True:，但当有多行以“>”开头时要注意边缘情况。

Answer 2

这是我对您想执行的操作的最佳猜测。你遗漏了边缘条件，就像你在第一行得到一个 G 一样，所以它一定是不完整的。

这是一种简单的过程方法，它在 'G' 上触发，并且仅当前一行是 > 时才打印。这样比向前看更容易。

for line in open(file):
    if line.startswith('>'):
        last_line = line

    elif line.startswith('G') and last_line:
        print(last_line)
        print(line)
        last_line = None

需要提前阅读一行而不是一次阅读两行 (Python)

Need to Read a Line Ahead Without Reading Two Lines at a Time (Python)

python

iteration

loops

readline

next