计数碱基 DNA 链序列
Counting bases DNA chain sequence
如何在 python 中编写读取 DNA 序列链的代码和 return 它是描述这三件事的重复碱基列表:哪个是碱基 (AGTC),它在链以及重复了多少次。例如:
ACTTTTGTCTAAACCCCCGTCCTATATAATACT
这个的输出是:list_bases =[('T',3,4),('A',11,3),('C', 14,6)]
这是您要找的吗?
DNA_seq = 'ACTTTTGTCTAAACCCCCCGTCCTATATATAACT'
count_dic = {'A': [0,0], "G": [0,0], "C": [0,0], "T": [0,0]}
for i in range(len(DNA_seq)-1):
j=i
seq_count = 1
while DNA_seq[j] == DNA_seq[j+1]:
seq_count +=1
j +=1
if seq_count > count_dic[DNA_seq[i]][1]:
count_dic[DNA_seq[i]][1] = seq_count
count_dic[DNA_seq[i]][0] = i + 1
count_dic的内容是
{'A': [11, 3], 'G': [0, 0], 'C': [14, 6], 'T': [3, 4]}
我做了以下事情:
import re
from collections import defaultdict
seq = "ACTTTTGTCTAAACCCCCCGTCCTATATATAACT"
bases = ['A','G','C','T']
indexes = defaultdict(list)
counts = dict()
for base in bases:
comSeq = re.compile(base)
matches = comSeq.findall(seq)
count = len(matches)
counts[base] = count
start = 0
for match in matches:
index = seq.find(base, start)
indexes[base].append(index)
start = index +1
print(indexes)
print(counts)
dict 索引为您提供碱基在链中的每个位置:
{'A': [0, 10, 11, 12, 24, 26, 28, 30, 31], 'G': [6, 19], 'C': [1, 8, 13, 14,
15, 16, 17, 18, 21, 22, 32], 'T': [2, 3, 4, 5, 7, 9, 20, 23, 25, 27, 29, 33]}
dict counts 给出了碱基在链中出现的次数:
{'A': 9, 'G': 2, 'C': 11, 'T': 12}
这可能不是最好和最有效的代码,我不确定您要的是什么,希望这对您有所帮助。
如何在 python 中编写读取 DNA 序列链的代码和 return 它是描述这三件事的重复碱基列表:哪个是碱基 (AGTC),它在链以及重复了多少次。例如:
ACTTTTGTCTAAACCCCCGTCCTATATAATACT
这个的输出是:list_bases =[('T',3,4),('A',11,3),('C', 14,6)]
这是您要找的吗?
DNA_seq = 'ACTTTTGTCTAAACCCCCCGTCCTATATATAACT'
count_dic = {'A': [0,0], "G": [0,0], "C": [0,0], "T": [0,0]}
for i in range(len(DNA_seq)-1):
j=i
seq_count = 1
while DNA_seq[j] == DNA_seq[j+1]:
seq_count +=1
j +=1
if seq_count > count_dic[DNA_seq[i]][1]:
count_dic[DNA_seq[i]][1] = seq_count
count_dic[DNA_seq[i]][0] = i + 1
count_dic的内容是
{'A': [11, 3], 'G': [0, 0], 'C': [14, 6], 'T': [3, 4]}
我做了以下事情:
import re
from collections import defaultdict
seq = "ACTTTTGTCTAAACCCCCCGTCCTATATATAACT"
bases = ['A','G','C','T']
indexes = defaultdict(list)
counts = dict()
for base in bases:
comSeq = re.compile(base)
matches = comSeq.findall(seq)
count = len(matches)
counts[base] = count
start = 0
for match in matches:
index = seq.find(base, start)
indexes[base].append(index)
start = index +1
print(indexes)
print(counts)
dict 索引为您提供碱基在链中的每个位置:
{'A': [0, 10, 11, 12, 24, 26, 28, 30, 31], 'G': [6, 19], 'C': [1, 8, 13, 14,
15, 16, 17, 18, 21, 22, 32], 'T': [2, 3, 4, 5, 7, 9, 20, 23, 25, 27, 29, 33]}
dict counts 给出了碱基在链中出现的次数:
{'A': 9, 'G': 2, 'C': 11, 'T': 12}
这可能不是最好和最有效的代码,我不确定您要的是什么,希望这对您有所帮助。