用 biopython 重命名交错的 fastq headers
Renaming interleaved fastq headers with biopython
为了便于使用和与另一个下游管道兼容,我正在尝试使用 biopython 更改 fastq 序列 ID 的名称。例如...从 headers 开始,看起来像这样:
@D00602:32:H3LN7BCXX:1:1101:1205:2112 OP:i:1
@D00602:32:H3LN7BCXX:1:1101:1205:2112 OP:i:2
@D00602:32:H3LN7BCXX:1:1101:1182:2184 OP:i:1
@D00602:32:H3LN7BCXX:1:1101:1182:2184 OP:i:2
致 header 看起来像这样的人:
@000000000000001 OP:i:1
@000000000000001 OP:i:2
@000000000000002 OP:i:1
@000000000000002 OP:i:2
我有一些代码,但我似乎无法得到交替的 header 倒计时(即 1、1、2、2、3、3 等)
如有任何帮助,我们将不胜感激。谢谢。
from Bio import SeqIO
import sys
FILE = sys.argv[1]
#Initialize numbering system at one
COUNT = 1
#Create a new dictionary for new sequence IDs
new_records=[]
for seq_record in SeqIO.parse(FILE, "fastq"):
header = '{:0>15}'.format(COUNT)
COUNT += 1
print(header)
seq_record.description =
seq_record.description.replace(seq_record.id, "")
seq_record.id = header
new_records.append(seq_record)
SeqIO.write(new_records, FILE, "fastq")
*seq_record不包含"OP:i:1"信息
假设您想要复制所有标签,您所要做的就是将计数除以重复的数量,然后 return 值向下舍入,如下所示。
from Bio import SeqIO
import sys
FILE = sys.argv[1]
#Initialize numbering system at one
COUNT = 0
#Create a new dictionary for new sequence IDs
new_records=[]
for seq_record in SeqIO.parse(FILE, "fastq"):
header = '{:0>15}'.format(COUNT//2+1)
COUNT += 1
print(header)
seq_record.description =
seq_record.description.replace(seq_record.id, "")
seq_record.id = header
new_records.append(seq_record)
SeqIO.write(new_records, FILE, "fastq")
为了便于使用和与另一个下游管道兼容,我正在尝试使用 biopython 更改 fastq 序列 ID 的名称。例如...从 headers 开始,看起来像这样:
@D00602:32:H3LN7BCXX:1:1101:1205:2112 OP:i:1
@D00602:32:H3LN7BCXX:1:1101:1205:2112 OP:i:2
@D00602:32:H3LN7BCXX:1:1101:1182:2184 OP:i:1
@D00602:32:H3LN7BCXX:1:1101:1182:2184 OP:i:2
致 header 看起来像这样的人:
@000000000000001 OP:i:1
@000000000000001 OP:i:2
@000000000000002 OP:i:1
@000000000000002 OP:i:2
我有一些代码,但我似乎无法得到交替的 header 倒计时(即 1、1、2、2、3、3 等)
如有任何帮助,我们将不胜感激。谢谢。
from Bio import SeqIO
import sys
FILE = sys.argv[1]
#Initialize numbering system at one
COUNT = 1
#Create a new dictionary for new sequence IDs
new_records=[]
for seq_record in SeqIO.parse(FILE, "fastq"):
header = '{:0>15}'.format(COUNT)
COUNT += 1
print(header)
seq_record.description =
seq_record.description.replace(seq_record.id, "")
seq_record.id = header
new_records.append(seq_record)
SeqIO.write(new_records, FILE, "fastq")
*seq_record不包含"OP:i:1"信息
假设您想要复制所有标签,您所要做的就是将计数除以重复的数量,然后 return 值向下舍入,如下所示。
from Bio import SeqIO
import sys
FILE = sys.argv[1]
#Initialize numbering system at one
COUNT = 0
#Create a new dictionary for new sequence IDs
new_records=[]
for seq_record in SeqIO.parse(FILE, "fastq"):
header = '{:0>15}'.format(COUNT//2+1)
COUNT += 1
print(header)
seq_record.description =
seq_record.description.replace(seq_record.id, "")
seq_record.id = header
new_records.append(seq_record)
SeqIO.write(new_records, FILE, "fastq")