生成排除特定序列的 DNA 序列
Generating DNA sequence excluding specific sequence
我刚开始使用 python 学习编程。在 class 中,我们被要求生成一个随机 DNA 序列,该序列不包含特定的 6 字母序列 (AACGTT)。关键是要使函数始终 return 成为合法序列。目前,我的函数在大约 78% 的时间内生成正确的序列。我怎样才能使 return 成为 100% 的合法序列?感谢任何帮助。
这是我的代码现在的样子:
from random import choice
def generate_seq(length, enzyme):
list_dna = []
nucleotides = ["A", "C", "T", "G"]
i = 0
while i < 1000:
nucleotide = choice(nucleotides)
list_dna.append(nucleotide)
i = i + 1
dna = ''.join(str(nucleotide) for nucleotide in list_dna)
return(dna)
seq = generate_seq(1000, "AACGTT")
if len(seq) == 1000 and seq.count("AACGTT") == 0:
print(seq)
一个选项是检查循环中的最后几个条目,并且仅在尚未创建 'bad' 序列时才继续追加。然而,这个 可能 导致具有 "AACGT" 序列的真实随机机会更高,只是使用不同的字母而不是最后一个 "T"
from random import choice
def generate_seq(length, enzyme):
list_dna = []
nucleotides = ["A", "C", "T", "G"]
i = 0
while i < 1000:
nucleotide = choice(nucleotides)
list_dna.append(nucleotide)
#check for invalid sequence. If found, remove last element and redraw
if ''.join(list_dna[-6:]) == "AACGTT":
list_dna.pop()
else:
i = i + 1
dna = ''.join(str(nucleotide) for nucleotide in list_dna)
return(dna)
seq = generate_seq(1000, "AACGTT")
if len(seq) == 1000 and seq.count("AACGTT") == 0:
print(seq)
一个想法是检查前 5 个核苷酸是否等于 AACGT
,在这种情况下只从 ["A", "C", "G"]
中选择。
from random import choice
def generate_seq(length, enzyme, bad_prefix="AACGT"):
list_dna = []
nucleotides = ["A", "C", "T", "G"]
i = 0
while i < 1000:
if list_dna[-5:] != bad_prefix:
nucleotide = choice(nucleotides)
else:
nucleotide = choice(["A", "C", "G"])
list_dna.append(nucleotide)
i = i + 1
dna = ''.join(str(nucleotide) for nucleotide in list_dna)
return dna
seq = generate_seq(1000, "AACGTT")
if len(seq) == 1000 and seq.count("AACGTT") == 0:
print(seq)
我刚开始使用 python 学习编程。在 class 中,我们被要求生成一个随机 DNA 序列,该序列不包含特定的 6 字母序列 (AACGTT)。关键是要使函数始终 return 成为合法序列。目前,我的函数在大约 78% 的时间内生成正确的序列。我怎样才能使 return 成为 100% 的合法序列?感谢任何帮助。
这是我的代码现在的样子:
from random import choice
def generate_seq(length, enzyme):
list_dna = []
nucleotides = ["A", "C", "T", "G"]
i = 0
while i < 1000:
nucleotide = choice(nucleotides)
list_dna.append(nucleotide)
i = i + 1
dna = ''.join(str(nucleotide) for nucleotide in list_dna)
return(dna)
seq = generate_seq(1000, "AACGTT")
if len(seq) == 1000 and seq.count("AACGTT") == 0:
print(seq)
一个选项是检查循环中的最后几个条目,并且仅在尚未创建 'bad' 序列时才继续追加。然而,这个 可能 导致具有 "AACGT" 序列的真实随机机会更高,只是使用不同的字母而不是最后一个 "T"
from random import choice
def generate_seq(length, enzyme):
list_dna = []
nucleotides = ["A", "C", "T", "G"]
i = 0
while i < 1000:
nucleotide = choice(nucleotides)
list_dna.append(nucleotide)
#check for invalid sequence. If found, remove last element and redraw
if ''.join(list_dna[-6:]) == "AACGTT":
list_dna.pop()
else:
i = i + 1
dna = ''.join(str(nucleotide) for nucleotide in list_dna)
return(dna)
seq = generate_seq(1000, "AACGTT")
if len(seq) == 1000 and seq.count("AACGTT") == 0:
print(seq)
一个想法是检查前 5 个核苷酸是否等于 AACGT
,在这种情况下只从 ["A", "C", "G"]
中选择。
from random import choice
def generate_seq(length, enzyme, bad_prefix="AACGT"):
list_dna = []
nucleotides = ["A", "C", "T", "G"]
i = 0
while i < 1000:
if list_dna[-5:] != bad_prefix:
nucleotide = choice(nucleotides)
else:
nucleotide = choice(["A", "C", "G"])
list_dna.append(nucleotide)
i = i + 1
dna = ''.join(str(nucleotide) for nucleotide in list_dna)
return dna
seq = generate_seq(1000, "AACGTT")
if len(seq) == 1000 and seq.count("AACGTT") == 0:
print(seq)