使用 python 的子字符串 multifasta 文件
Substring multifasta file using python
我正在尝试从 multifasta 文件中从位置 2 到 8(microRNA 的种子)提取序列。为此,我编写了一个 python 小脚本。该脚本有效,但我无法编写输出文件。谁能帮助我或指出正确的方向?
谢谢
脚本:
from Bio import SeqIO
for index, record in enumerate(SeqIO.parse("file.fasta","fasta")):
seed= record.seq[1:8]
a = (print(">" + record.id + '\n\ + seed)
输出:
>aga-miR-12417-5p
AGUCGUU
>aga-miR-12418-3p
GUUCGAU
>aga-miR-12419-5p
GCUGUUC
您只需要使用SeqIO.write()
,这将是与您当前结构最相似的方式:
from Bio import SeqIO
with open("out_file.fasta", "w") as out_f:
for index, record in enumerate(SeqIO.parse("file.fasta","fasta")):
record.seq = record.seq[1:8]
SeqIO.write(record, out_f, "fasta")
如果您只需要写入一个简单的纯文本文件,您可以这样做:
from Bio import SeqIO
with open('output_file', 'w') as output:
for index, record in enumerate(SeqIO.parse("file.fasta","fasta")):
seed= record.seq[1:8]
output.writeline(">" + record.id)
output.writeline(seed)
我正在尝试从 multifasta 文件中从位置 2 到 8(microRNA 的种子)提取序列。为此,我编写了一个 python 小脚本。该脚本有效,但我无法编写输出文件。谁能帮助我或指出正确的方向?
谢谢
脚本:
from Bio import SeqIO
for index, record in enumerate(SeqIO.parse("file.fasta","fasta")):
seed= record.seq[1:8]
a = (print(">" + record.id + '\n\ + seed)
输出:
>aga-miR-12417-5p
AGUCGUU
>aga-miR-12418-3p
GUUCGAU
>aga-miR-12419-5p
GCUGUUC
您只需要使用SeqIO.write()
,这将是与您当前结构最相似的方式:
from Bio import SeqIO
with open("out_file.fasta", "w") as out_f:
for index, record in enumerate(SeqIO.parse("file.fasta","fasta")):
record.seq = record.seq[1:8]
SeqIO.write(record, out_f, "fasta")
如果您只需要写入一个简单的纯文本文件,您可以这样做:
from Bio import SeqIO
with open('output_file', 'w') as output:
for index, record in enumerate(SeqIO.parse("file.fasta","fasta")):
seed= record.seq[1:8]
output.writeline(">" + record.id)
output.writeline(seed)