如何在分析上游和下游侧翼区域的同时遍历字符串的各个部分？

Question

我想在 python 中进行滑动 window 以 120 个碱基对的框架检查 DNA 序列（长度在 2000 到 4000 个碱基对之间） .但是，我还想考虑 120 个碱基对框架上下游区域侧翼的大约 20 个核苷酸。但是，例如，如果滑动 window 移动到 2000 个碱基对长的 DNA 序列中的位置 14 或位置 1992，那么显然上游或下游侧翼区域必须少于 20 个碱基对长.

到目前为止，我的代码是这样设计的：

import from Bio import SeqIO
from Bio.Alphabet.IUPAC import IUPACUnambiguousDNA

fasta= SeqIO.to_dict(SeqIO.parse("RD4.fasta", "fasta", alphabet=IUPACUnambiguousDNA()))

sequence= DNA_sequence.values()[0].seq
print(sequence)
sequence= "TGTGAATTCATACAAGCCGTAGTCGTGCAGAAGCGCAACACTCTTGGAGTGGCCTACAACGGCGCTCTCCGCGGCGCGGGCGTACCGGATATCTTAGCTGGTCAATAGCCATTTTTCAGCAATTTCTCAGTAACGCTACGGG"


target_length= 120 
for position in range(len(sequence)-target_length+1):
    stop= position+target_length
    potential_target_frame= sequence[position:stop]
    potential_target_frame= str(potential_target)
    if position < 20:
        upstream_flank= sequence[:position]
        downstream_flank= sequence[stop:stop+20]
    elif len(sequence) - stop < 20:
        upstream_flank= sequence[position-20:position]
        downstream_flank= sequence[stop:]
    else:
        upstream_flank= sequence[position-20:position]
        downstream_flank= sequence[stop:stop+20]
    print("upstream flank is " + upstream_flank)
    print("downstream flank is " + downstream_flank)

虽然这段代码表面上是按逻辑设计的，但打印功能表明这段代码的设计方式存在问题——只打印下游侧翼，而不打印上游侧翼。

是我的条件树设置有问题，还是我切割原始序列的方式有问题？

Answer 1

原来是我错误地设置了条件树。因为我正在处理字符串的两个不同部分，并且因为这两个部分可能存在于三种不同的状态（长度大于 20、小于 20 或等于 0），所以必须有 3^2 个部分我的条件树。在上游或下游侧翼的长度为零的情况下，我将其变量设置为空字符串。

代码应该是这样设置的（我从上面设置的代码中稍微压缩了它，并更改了上游和下游部分的计算方式）：

target_length= 120 
for position in range(len(sequence)-target_length+1):
    stop= position+target_length
    potential_target_frame= sequence[position:stop]
    potential_target_frame= str(potential_target)
    if len(sequence[:pos]) == 0 and len(sequence[stop:]) > 20:
        upstream_flank= " "
        downstream_flank= sequence[stop:stop+20]
        print("upstream flank is " + upstream_flank)
        print("downstream flank is " + downstream_flank)
    elif (len(sequence[:pos]) >0 and <20) and (len(sequence[stop:]) >20:
        upstream_flank= sequence[:position]
        downstream_flank= sequence[stop:stop+20]
        print("upstream flank is " + upstream_flank)
        print("downstream flank is " + downstream_flank)
    ############ 
    #####Just assume the other 5 out of 8 scenarios will be written out in elif conditions in this hash section
    ############
    else:
        upstream_flank= sequence[position-20:position]
        downstream_flank= sequence[stop:stop+20]
        print("upstream flank is " + upstream_flank)
        print("downstream flank is " + downstream_flank)

如何在分析上游和下游侧翼区域的同时遍历字符串的各个部分？

How to iterate over sections of a string while also analyzing the up and downstream flanking regions?

python

conditional

slice