获取不同于两个 fastq 文件的记录

getting records which are different from two fastq files

我有 2 个 fastq 文件 F1.fastq 和 F2.fastq。 F2.fastq 是一个较小的文件,它是 F1.fastq 中读取的子集。我想阅读 F1.fastq 中不在 F2.fastq 中的内容。以下 python 代码似乎不起作用。你能提出修改建议吗?

needed_reads = []

reads_array = []

chosen_array = []

for x in Bio.SeqIO.parse("F1.fastq","fastq"):

        reads_array.append(x)

for y in Bio.SeqIO.parse("F2.fastq","fastq"):

        chosen_array.append(y)

for y in chosen_array:

        for x in reads_array:

                if str(x.seq) != str(y.seq) : needed_reads.append(x)

output_handle = open("DIFF.fastq","w")

SeqIO.write(needed_reads,output_handle,"fastq")

output_handle.close()

您可以使用集合来完成您的要求,您可以将 list1 转换为 set,然后将 list2 转换为 set,然后执行 set(list1) - set(list2) , 它会给出 list1 中不在 list2 中的项目。

示例代码-

needed_reads = []

reads_array = []

chosen_array = []

for x in Bio.SeqIO.parse("F1.fastq","fastq"):

        reads_array.append(x)

for y in Bio.SeqIO.parse("F2.fastq","fastq"):

        chosen_array.append(y)

needed_reads = list(set(reads_array) - set(chosen_array))

output_handle = open("DIFF.fastq","w")

SeqIO.write(needed_reads,output_handle,"fastq")

output_handle.close()