在 python 中使用两个不同的文本作为输入写入文件

Question

我有两个大的 .fasta 文件，如下所示：

文件 1：

>A01
ABCDGENG
>A02
JALSKDLAS
#and so on

文件 2：

>KJ01
KGLW
>XB02
CTRIPIO
#and so on

我想为每对条目生成单独的文件（两者共享相同的长度，所以第一个输出看起来像这样：

>A01
ABCDGENG
>KJ01
KGLW

并且，作为一个细节，它们以第一个文件命名，因此该示例将被称为 A01.fasta。

当我只有一个大文件时，我已经有了一个非常适合的脚本，但我需要一些关于如何将第二部分添加到每个单独文件的指导。这是脚本：

import os
from os import path
import sys

infile=open("D:/path_to_file")
os.system("D:/path_to_project") # Add your project directory in here

path = "D:/path_to_project"

opened = False # Assume outfile is not open
i=0
for line_ref in infile:
    if line_ref[0] == ">": # If line begins with ">"
        i =i+1 #in case that there are files with the same name
        if(opened): 
            outfile.close() # Will close the outfile if it is open 
        opened = True # Set opened to True to represent an opened outfile
        contig_name = line_ref[1:].rstrip() #Extract contig name: remove ">", extract contig string, remove any spaces or new lins following file
        print("contig: " + contig_name)
        outfile=open(path + "/" + str(contig_name) +"-"+ str(i)+ ".fasta", 'w')
    outfile.write(line_ref)    
outfile.close()
print("Fin")

但我不知道如何查看另一个文件 (File 2) 的行并将它们添加到第一个文件下，而无需关闭文件并再次打开它。提前致谢！

Answer 1

用普通的文件操作来做这件事会很复杂，因为 FASTA 序列的行数是可变的。最好用一个库来解析文件，比如pyfastx

import pyfastx

fa1 = pyfastx.Fastx('file1.fasta')
fa2 = pyrastx.Fastx('file2.fasta')

for index, ((name1, seq1, comment1), (name2, seq2, comment2)) in enumerate(zip(fa1, fa2), 1):
    with open(f"outfile{index}.fasta", "w") as out:
        out.write(f">{name1}\n{seq1}\n{name2}\n{seq2}\n")

在 python 中使用两个不同的文本作为输入写入文件

Writing a file using two different texts as input in python

python

text

fasta

python-3.x