根据bp坐标提取一部分fasta序列
Extract a part of fasta sequence based on bp coordinates
我有一个巨大的 fasta 文件,但我只需要提取其中的一部分,如果我知道我的序列的起始和结束碱基对坐标。另外,它应该是 fasta 格式,每个长度为 60 bp line.This 是我的尝试,如果看起来不错请告诉我,欢迎提出任何改进建议。
from Bio import SeqIO
inFile = open('full_chr.fa','r')
fw=open("part.fa",'w')
line_width = 60
for record in SeqIO.parse(inFile,'fasta'):
fw.write(">" + record.id + "\n")
fww = (str(record.seq[600130000:602000000]) + '\n')
for i in xrange(0,len(fww),line_width):
fw.write(str(fww[i:i+line_width]) + '\n')
fw.close()
就这么简单:
from Bio import SeqIO
record = SeqIO.read("Chromosome.fas", "fasta")
with open("output.fas", "w") as out:
SeqIO.write(record[100:500], out, "fasta")
SeqIO.write
已经使用了 60 个字符长度的换行。如果您想操纵换行,请使用 FastaWriter
对象。这是 80 bp 行的示例:
from Bio import SeqIO
from Bio.SeqIO.FastaIO import FastaWriter
record = SeqIO.read("Chromosome.fas", "fasta")
with open("output.fas", "w") as out:
writer = FastaWriter(out, wrap=80)
writer.write_header()
writer.write_record(record[100:500])
writer.write_footer()
我有一个巨大的 fasta 文件,但我只需要提取其中的一部分,如果我知道我的序列的起始和结束碱基对坐标。另外,它应该是 fasta 格式,每个长度为 60 bp line.This 是我的尝试,如果看起来不错请告诉我,欢迎提出任何改进建议。
from Bio import SeqIO
inFile = open('full_chr.fa','r')
fw=open("part.fa",'w')
line_width = 60
for record in SeqIO.parse(inFile,'fasta'):
fw.write(">" + record.id + "\n")
fww = (str(record.seq[600130000:602000000]) + '\n')
for i in xrange(0,len(fww),line_width):
fw.write(str(fww[i:i+line_width]) + '\n')
fw.close()
就这么简单:
from Bio import SeqIO
record = SeqIO.read("Chromosome.fas", "fasta")
with open("output.fas", "w") as out:
SeqIO.write(record[100:500], out, "fasta")
SeqIO.write
已经使用了 60 个字符长度的换行。如果您想操纵换行,请使用 FastaWriter
对象。这是 80 bp 行的示例:
from Bio import SeqIO
from Bio.SeqIO.FastaIO import FastaWriter
record = SeqIO.read("Chromosome.fas", "fasta")
with open("output.fas", "w") as out:
writer = FastaWriter(out, wrap=80)
writer.write_header()
writer.write_record(record[100:500])
writer.write_footer()