Python 在上下文之后打印行

Question

如何在我感兴趣的上下文后打印两行 python。

Example.fastq

@read1
AAAGGCTGTACTTCGTTCCAGTTG
+
'(''%$'))%**)2+'.(&&'/5-
@read2
CTGAGTTGAGTTAGTGTTGACTC
+
)(+-0-2145=588..,(1-,12

我可以使用...找到感兴趣的上下文

fastq = open(Example.fastq, "r")

IDs = [read1]

with fastq as fq:
    for line in fq:
        if any(string in line for string in IDs):

现在我已经找到了 read1，我想为 read1 打印出以下几行。在 bash 中，我可能会使用 grep -A 之类的东西来执行此操作。所需的输出行如下所示。

+
'(''%$'))%**)2+'.(&&'/5-

但在 python 中我似乎找不到等效的工具。也许 "islice" 可能有用，但我不知道如何让 islice 从匹配的位置开始。

with fastq as fq:
    for line in fq:
        if any(string in line for string in IDs):
            print(list(islice(fq,3,4)))

Answer 1

你可以使用next()来推进一个迭代器（包括文件）：

print(next(fq))
print(next(fq))

这会消耗那些行，因此 for 循环将继续 @read2。

如果您不想要 AAA... 行，您也可以使用 next(fq) 来使用它。全文：

fastq = open(Example.fastq, "r")

IDs = [read1]

with fastq as fq:
    for line in fq:
        if any(string in line for string in IDs):
            next(fq)  # skip AAA line
            print(next(fq).strip())  # strip off the extra newlines
            print(next(fq).strip())

这给出了

+
'(''%$'))%**)2+'.(&&'/5-

Answer 2

如果您正在处理 FASTQ 文件，最好使用像 BioPython 这样的生物信息学库，而不是使用您自己的解析器。

要获得您要求的准确结果，您可以这样做：

from Bio.SeqIO.QualityIO import FastqGeneralIterator

IDs = ['read1']

with open('Example.fastq') as in_handle:
    for title, seq, qual in FastqGeneralIterator(in_handle):
        # The ID is the first word in the title line (after the @ sign):
        if title.split(None, 1)[0] in IDs:
            # Line 3 is always a '+', optionally followed by the same sequence identifier again.
            print('+') 
            print(qual)

但是您不能单独使用质量值这一行。您的下一步几乎肯定是将其转换为 Phred quality scores. But this is notoriously complicated because there are at least three different and incompatible variants of the FASTQ file format。 BioPython 会为您处理所有边缘情况，因此您可以这样做：

from Bio.SeqIO import parse

IDs = ['read1']

with open('Example.fastq') as in_handle:
    for record in parse(in_handle, 'fastq'):
        if record.id in IDs:
            print(record.letter_annotations["phred_quality"])

Python 在上下文之后打印行

Python print lines after context

python

grep

bioinformatics

fasta

fastq