Python:如何根据位置输出FASTAheader或染色体索引图?
Python: How to output the FASTA header or chromosome index figure according to the location?
我有代码可以帮助我在大小为 5 的 window 从左向右移动时移动它。该文件采用 fasta 格式,例如 header >chromosome 后跟染色体索引。我想根据确切的索引输出 header 索引号。谁能帮帮我?
代码
from Bio import SeqIO
with open("test1_out.fasta","w") as f:
for seq_record in SeqIO.parse("test1.fasta", "fasta"):
for i in range(len(seq_record.seq) - 4) :
f.write(">" + str(seq_record.id) + "\n")
f.write(str(seq_record.seq[i:i+5]) + "\n")
test1.fasta
>chr1:1-8
ATCGCGTC
>chr2:1-10
ATTTTCGCGA
实际产量
>chr1:1-8
ATCGC
>chr1:1-8
TCGCG
>chr1:1-8
CGCGT
>chr1:1-8
GCGTC
>chr2:1-10
ATTTT
>chr2:1-10
TTTTC
>chr2:1-10
TTTCG
>chr2:1-10
TTCGC
>chr2:1-10
TCGCG
>chr2:1-10
CGCGA
期望输出
>chr1:1-5
ATCGC
>chr1:2-6
TCGCG
>chr1:3-7
CGCGT
>chr1:4-8
GCGTC
>chr2:1-5
ATTTT
>chr2:2-6
TTTTC
>chr2:3-7
TTTCG
>chr2:4-8
TTCGC
>chr2:5-9
TCGCG
>chr2:6-10
CGCGA
你只需要修改标题写法:
seq_name = seq_record.id.split(":")[0] # Get the "chr1"
for i in range(len(seq_record.seq) - 4):
seq_coords = "{}-{}".format(i + 1, i + 5) # Make the coordinates
f.write(">" + seq_name + ":" + seq_coords + "\n") # Print them both
我有代码可以帮助我在大小为 5 的 window 从左向右移动时移动它。该文件采用 fasta 格式,例如 header >chromosome 后跟染色体索引。我想根据确切的索引输出 header 索引号。谁能帮帮我?
代码
from Bio import SeqIO
with open("test1_out.fasta","w") as f:
for seq_record in SeqIO.parse("test1.fasta", "fasta"):
for i in range(len(seq_record.seq) - 4) :
f.write(">" + str(seq_record.id) + "\n")
f.write(str(seq_record.seq[i:i+5]) + "\n")
test1.fasta
>chr1:1-8
ATCGCGTC
>chr2:1-10
ATTTTCGCGA
实际产量
>chr1:1-8
ATCGC
>chr1:1-8
TCGCG
>chr1:1-8
CGCGT
>chr1:1-8
GCGTC
>chr2:1-10
ATTTT
>chr2:1-10
TTTTC
>chr2:1-10
TTTCG
>chr2:1-10
TTCGC
>chr2:1-10
TCGCG
>chr2:1-10
CGCGA
期望输出
>chr1:1-5
ATCGC
>chr1:2-6
TCGCG
>chr1:3-7
CGCGT
>chr1:4-8
GCGTC
>chr2:1-5
ATTTT
>chr2:2-6
TTTTC
>chr2:3-7
TTTCG
>chr2:4-8
TTCGC
>chr2:5-9
TCGCG
>chr2:6-10
CGCGA
你只需要修改标题写法:
seq_name = seq_record.id.split(":")[0] # Get the "chr1"
for i in range(len(seq_record.seq) - 4):
seq_coords = "{}-{}".format(i + 1, i + 5) # Make the coordinates
f.write(">" + seq_name + ":" + seq_coords + "\n") # Print them both