使用 Biopython 从 FASTA 文件中获取 ID
Get ID from a FASTA file with Biopython
我正在使用 Biopython 从带有核苷酸序列的 FASTA 文件中获取一些信息。
但是,我只想从该文件中获取 ID 和序列。
我有这个代码:
from Bio import SeqIO
for seq_nucleotides in SeqIO.parse("CDS_sequence.txt", "fasta"):
print(seq_nucleotides.description)
print(seq_nucleotides.seq)
得到这个:
lcl|NC_000913.3_cds_NP_414584.1_1 [gene=fixB] [locus_tag=b0042] [db_xref=UniProtKB/Swiss-Prot :P31574] [protein=推定的电子转移黄素蛋白 FixB] [protein_id=NP_414584.1] [location=43188..44129] [gbkey=CDS]
ATGAACACGTTTCTCAAGTCTGGGTATTCGCGATACCCCTTCTCGTCTGCCGGAACTGATGAACGGTGCGCAGGCTTTAGCTAATCAAATCAACACCTTTGTCCTCAATGATGCCGACGGCGCACAGGCAATCCAGCTCGGCGCTAATCATGTCTGGAAATTAAACGGCAAACCGGACGATCGGATGATCGAAGATTACGCCG ...
我只需要获取 protein_Id = NP_414584.1 和序列。但我不知道如何从字符串描述中提取 ID 并得到如下内容:
NP_414584.1
ATGAACACGTTTCTCAAGTCTGGGTATTCGCGAT...
from Bio import SeqIO
def sequence_extract_fasta(fasta_files):
# Defining empty list for the Fasta id and fasta sequence variables
fasta_id = []
fasta_seq = []
# opening given fasta file using the file path
with open(fasta_files, 'r') as fasta_file:
# extracting multiple data in single fasta file using biopython
for record in SeqIO.parse(fasta_file, 'fasta'): # (file handle, file format)
# appending extracted fasta data to empty lists variables
fasta_seq.append(record.seq)
fasta_id.append(record.id)
return fasta_id, fasta_seq
我正在使用 Biopython 从带有核苷酸序列的 FASTA 文件中获取一些信息。 但是,我只想从该文件中获取 ID 和序列。 我有这个代码:
from Bio import SeqIO
for seq_nucleotides in SeqIO.parse("CDS_sequence.txt", "fasta"):
print(seq_nucleotides.description)
print(seq_nucleotides.seq)
得到这个:
lcl|NC_000913.3_cds_NP_414584.1_1 [gene=fixB] [locus_tag=b0042] [db_xref=UniProtKB/Swiss-Prot :P31574] [protein=推定的电子转移黄素蛋白 FixB] [protein_id=NP_414584.1] [location=43188..44129] [gbkey=CDS]
ATGAACACGTTTCTCAAGTCTGGGTATTCGCGATACCCCTTCTCGTCTGCCGGAACTGATGAACGGTGCGCAGGCTTTAGCTAATCAAATCAACACCTTTGTCCTCAATGATGCCGACGGCGCACAGGCAATCCAGCTCGGCGCTAATCATGTCTGGAAATTAAACGGCAAACCGGACGATCGGATGATCGAAGATTACGCCG ...
我只需要获取 protein_Id = NP_414584.1 和序列。但我不知道如何从字符串描述中提取 ID 并得到如下内容:
NP_414584.1
ATGAACACGTTTCTCAAGTCTGGGTATTCGCGAT...
from Bio import SeqIO
def sequence_extract_fasta(fasta_files):
# Defining empty list for the Fasta id and fasta sequence variables
fasta_id = []
fasta_seq = []
# opening given fasta file using the file path
with open(fasta_files, 'r') as fasta_file:
# extracting multiple data in single fasta file using biopython
for record in SeqIO.parse(fasta_file, 'fasta'): # (file handle, file format)
# appending extracted fasta data to empty lists variables
fasta_seq.append(record.seq)
fasta_id.append(record.id)
return fasta_id, fasta_seq