计算不同的单词

Counting the distinct words

如何计算跨越多行的文本中的不同单词?

Input data: You will read the text of the email from the keyboard. It can span multiple lines and contains only lowercase letters of the English alphabet and spaces.

Output data: A single integer representing the number of distinct words in the email will be displayed.

我有这个代码:

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.*;
public class Main {
public static void main(String args[] ) throws Exception {
    Set<String> words = new TreeSet<>();
    int count = 0;
    try(BufferedReader reader = new BufferedReader(new InputStreamReader(System.in))){
        String line;
        while((line = reader.readLine()) != null){
            words.add(line);
            count++;
        }
    }catch(Exception e){
        e.printStackTrace();
    }
    System.out.println(count);
 }

基本上它只适用于测试,但在其他方面是错误的,因为我理解它是因为它仍然读取一个空 space,我可以修复什么?

改变

words.add(line);

添加每个 line 中的单词。在白色 space 上拆分,并添加生成的标记。喜欢,

words.addAll(Arrays.asList(line.split("\s+")));

然后改变

System.out.println(count);

System.out.println(words.size());

最后,消去count(不需要,Set "counts" 元素按集合中的元素个数;即words.size()).并且,除非有某种原因需要对您的元素进行排序,否则请使用 HashSet.

Set<String> words = new TreeSet<>();

应该是(据我所知)

Set<String> words = new HashSet<>();