计算不同的单词

Question

如何计算跨越多行的文本中的不同单词？

Input data: You will read the text of the email from the keyboard. It can span multiple lines and contains only lowercase letters of the English alphabet and spaces.

Output data: A single integer representing the number of distinct words in the email will be displayed.

我有这个代码：

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.*;
public class Main {
public static void main(String args[] ) throws Exception {
    Set<String> words = new TreeSet<>();
    int count = 0;
    try(BufferedReader reader = new BufferedReader(new InputStreamReader(System.in))){
        String line;
        while((line = reader.readLine()) != null){
            words.add(line);
            count++;
        }
    }catch(Exception e){
        e.printStackTrace();
    }
    System.out.println(count);
 }

基本上它只适用于测试，但在其他方面是错误的，因为我理解它是因为它仍然读取一个空 space，我可以修复什么？

Answer 1

改变

words.add(line);

添加每个 line 中的单词。在白色 space 上拆分，并添加生成的标记。喜欢，

words.addAll(Arrays.asList(line.split("\s+")));

然后改变

System.out.println(count);

至

System.out.println(words.size());

最后，消去count（不需要，Set "counts" 元素按集合中的元素个数；即words.size()).并且，除非有某种原因需要对您的元素进行排序，否则请使用 HashSet.

Set<String> words = new TreeSet<>();

应该是（据我所知）

Set<String> words = new HashSet<>();

计算不同的单词

Counting the distinct words

java

algorithm

bufferedreader

treeset