Java程序故障

Question

我的问题的前半部分：当我尝试运行我的程序时，它会永远加载和加载；它从不显示结果。有人可以检查我的代码并在某处发现错误。这个程序的目的是找到一个起始DNA密码子ATG并一直寻找直到找到终止密码子TAA或TAG或TGA，然后从开始到停止打印出基因。我正在使用 BlueJ。

我的问题的后半部分：我应该编写一个程序，其中需要执行以下步骤：

To find the first gene, find the start codon ATG.
Next look immediately past ATG for the first occurrence of each of the three stop codons TAG, TGA, and TAA.
If the length of the substring between ATG and any of these three stop codons is a multiple of three, then a candidate for a gene is the start codon through the end of the stop codon.
If there is more than one valid candidate, the smallest such string is the gene. The gene includes the start and stop codon.
If no start codon was found, then you are done.
If a start codon was found, but no gene was found, then start searching for another gene via the next occurrence of a start codon starting immediately after the start codon that didn't yield a gene.
If a gene was found, then start searching for the next gene immediately after this found gene.

请注意，根据此算法，对于字符串 "ATGCTGACCTGATAG"，ATGCTGACCTGATAG 可能是基因，但 ATGCTGACCTGA 不会是基因，即使它更短，因为找到了 'TGA' 的另一个实例首先，距离起始密码子不是三的倍数。

在我的作业中，我也被要求生成这些方法：

具体来说，要实现该算法，您应该执行以下操作。

Write the method findStopIndex that has two parameters dna and index, where dna is a String of DNA and index is a position in the string. This method finds the first occurrence of each stop codon to the right of index. From those stop codons that are a multiple of three from index, it returns the smallest index position. It should return -1 if no stop codon was found and there is no such position. This method was discussed in one of the videos.
Write the void method printAll that has one parameter dna, a String of DNA. This method should print all the genes it finds in DNA. This method should repeatedly look for a gene, and if it finds one, print it and then look for another gene. This method should call findStopIndex. This method was also discussed in one of the videos.
Write the void method testFinder that will use the two small DNA example strings shown below. For each string, it should print the string, and then print the genes found in the string. Here is sample output that includes the two DNA strings:

示例输出为：

ATGAAATGAAAA

找到的基因是：

ATGAAATGA

DNA 串是：

ccatgccctaataaatgtctgtaatgtaga

找到的基因是：

atgccctaa

atgtctgtaatgtag

DNA 串是：

CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA

找到的基因是：

ATGTAA

ATGAATGACTGATAG

ATGCTATGA

ATGTGA

我仔细考虑了一下，发现这段代码接近工作状态。我只需要让我的输出产生说明中要求的结果。希望这不会太混乱，我只是不知道如何在起始密码子之后寻找终止密码子，然后如何获取基因序列。我也希望通过找到三个标签（tag，tga，taa）中哪个更接近atg来了解如何获得最接近的基因序列。我知道这很多，但希望这一切都有意义。

import edu.duke.*;
import java.io.*;

public class FindMultiGenes {
    public String findGenes(String dnaOri) {
        String gene = new String();
        String dna = dnaOri.toLowerCase();
        int start = -1;
        while(true){
            start = dna.indexOf("atg", start);
            if (start == -1) {
                break;
            }
            int stop = findStopCodon(dna, start); 
            if(stop > start){
                String currGene = dnaOri.substring(start, stop+3);

                System.out.println("From: " + start + " to " + stop + "Gene: "    
                +currGene);}
        }
        return gene;
    } 

    private int findStopCodon(String dna, int start){   
        for(int i = start + 3; i<dna.length()-3; i += 3){
            String currFrameString = dna.substring(i, i+3);

            if(currFrameString.equals("TAG")){
                return i;

            } else if( currFrameString.equals("TGA")){
                return i;

            } else if( currFrameString.equals("TAA")){
                return i;

            }
        }   
        return -1;
    }

    public void testing(){


        FindMultiGenes FMG = new FindMultiGenes();

        String dna =     
        "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";

        FMG.findGenes(dna);




        System.out.println("DNA string is: " + dna);

    } 
}

Answer 1

将您的线路 start = dna.indexOf("atg", start); 更改为

start = dna.indexOf("atg", start + 1);

当前发生的事情是您在索引 k 处找到 "atg" 并在下一个运行中搜索 [=13= 中的下一个 "atg" 的字符串] 向前。由于开始位置包含在内，因此会在完全相同的位置找到下一个匹配项。因此，您将一遍又一遍地找到相同的索引 k，并且永远不会停止。

通过将索引增加 1，您将跳过当前找到的索引 k 并从 k+1 开始搜索下一个匹配项。

Answer 2

This program is meant to find a start DNA codon ATG and keep looking until finding a stop codon TAA or TAG or TGA, and then print out the gene from start to stop.

由于第一次搜索总是从 0 开始，您可以在那里设置起始索引，然后从结果中搜索终止密码子。这里我用 1 个终止密码子来做：

public static void main(String[] args) {

    String dna = "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";
    String sequence = dna.toLowerCase();
    int index = 0;
    int newIndex = 0;
    while (true) {
        index = sequence.indexOf("atg", index);
        if (index == -1)
            return;
        newIndex = sequence.indexOf("tag", index + 3);
        if (newIndex == -1) // Check needed only if a stop codon is not guaranteed for each start codon.
            return;
        System.out.println("From " + (index + 3) + " to " + newIndex + " Gene: " + sequence.substring(index + 3, newIndex));
        index = newIndex + 3;
    }
}

输出：

From 4 to 7 Gene: taa
From 13 to 22 Gene: aatgactga

另外，您可以使用正则表达式为您做很多工作：

public static void main(String[] args) {

    String dna = "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";

    Pattern p = Pattern.compile("ATG([ATGC]+?)TAG");
    Matcher m = p.matcher(dna);

    while (m.find())
        System.out.println("From " + m.start(1) + " to " + m.end(1) + " Gene: " + m.group(1));
}

输出：

From 4 to 7 Gene: TAA
From 13 to 22 Gene: AATGACTGA

Java程序故障

Java program malfunction

java

dna-sequence