标记化字符串中每个标记的字符数,Java

Char count of each token in Tokenized String, Java

我想知道我是否可以计算每个标记的字符数并显示该信息,例如:

天被标记化,我的输出将是:“天有 3 个字符。”并继续为每个令牌执行此操作。

我最后一个打印出每个标记中字符数的循环从未打印出来:

public static void main(String[] args) {

    Scanner sc = new Scanner(System.in);

    ArrayList<String> tokenizedInput = new ArrayList<>();
    String sentenceRetrieved;

    // getting the sentence from the user
    System.out.println("Please type a sentence containing at least 4 words, with a maximum of 8 words: ");
    sentenceRetrieved = sc.nextLine();
    StringTokenizer strTokenizer = new StringTokenizer(sentenceRetrieved);

    // checking to ensure the string has 4-8 words
    while (strTokenizer.hasMoreTokens()) {
        if (strTokenizer.countTokens() > 8) {
            System.out.println("Please re-enter a sentence with at least 4 words, and a maximum of 8");
            break;

        } else {
            while (strTokenizer.hasMoreTokens()) {
                tokenizedInput.add(strTokenizer.nextToken());
            }

            System.out.println("Thank you.");
            break;
        }
    }

    // printing out the sentence
    System.out.println("You entered: ");
    System.out.println(sentenceRetrieved);

    // print out each word given
    System.out.println("Each word in your sentence is: " + tokenizedInput);

    // count the characters in each word
    // doesn't seem to run

    int totalLength = 0;
    while (strTokenizer.hasMoreTokens()) {
        String token;
        token = sentenceRetrieved;
        token = strTokenizer.nextToken();
        totalLength += token.length();
        System.out.println("Word: " + token + " Length:" + token.length());
    }

}

}

控制台示例:

请输入至少包含 4 个单词的句子,最多包含 8 个单词:

你好,这是一个测试

谢谢。

您输入了:

你好,这是一个测试

你句子中的每个词是:[你好,那里,这,是,a,测试]

首先,我添加了必要的导入并围绕这个主要方法构建了一个 class。这应该编译。

import java.util.ArrayList;
import java.util.Scanner;
import java.util.StringTokenizer;

public class SOQ_20200913_1
{

   public static void main(String[] args) {
   
      Scanner sc = new Scanner(System.in);
   
      ArrayList<String> tokenizedInput = new ArrayList<>();
      String sentenceRetrieved;
   
    // getting the sentence from the user
      System.out.println("Please type a sentence containing at least 4 words, with a maximum of 8 words: ");
      sentenceRetrieved = sc.nextLine();
      StringTokenizer strTokenizer = new StringTokenizer(sentenceRetrieved);
   
    // checking to ensure the string has 4-8 words
      while (strTokenizer.hasMoreTokens()) {
         if (strTokenizer.countTokens() > 8) {
            System.out.println("Please re-enter a sentence with at least 4 words, and a maximum of 8");
            break;
         
         } else {
            while (strTokenizer.hasMoreTokens()) {
               tokenizedInput.add(strTokenizer.nextToken());
            }
         
            System.out.println("Thank you.");
            break;
         }
      }
   
    // printing out the sentence
      System.out.println("You entered: ");
      System.out.println(sentenceRetrieved);
   
    // print out each word given
      System.out.println("Each word in your sentence is: " + tokenizedInput);
   
    // count the characters in each word
    // doesn't seem to run
   
      int totalLength = 0;
      while (strTokenizer.hasMoreTokens()) {
         String token;
         token = sentenceRetrieved;
         token = strTokenizer.nextToken();
         totalLength += token.length();
         System.out.println("Word: " + token + " Length:" + token.length());
      }
   
   }

}

接下来,让我们看一下这个工作示例。在您的最后一个 while 循环(计算字符长度的循环)之前,一切似乎都很好。但如果您注意到,最后一个循环之前的 while 循环将继续循环,直到它 没有更多的标记 可以获取。因此,在它收集完所有令牌并且没有更多令牌可收集后,您尝试创建最终的 while 循环,要求它收集更多令牌。它不会到达 while 循环,直到它 运行 没有令牌可以收集!

最后,为了解决这个问题,您可以简单地遍历您在倒数第二个 while 循环中添加到的列表,然后简单地循环遍历该列表以进行最后一个循环!

例如:

  int totalLength = 0;

  for (String each : tokenizedInput) {

     totalLength += each.length();
     System.out.println("Word: " + each + " Length:" + each.length());

  }