忽略特殊字符的单词使用计数器

Question

我找到的代码将特殊字符计为独特的单词，因此没有给我准确的答案。如何将特殊字符替换器或类似的东西集成到我现有的代码中？

这是我目前得到的示例输出：

我想要的是 “这 = 1，是 = 1，a = 1，测试 = 1，测试 = 2，如何 = 1，所以 = 1”。

public class WordCounter {
    static void countEachWrds(String fileName, Map<String, Integer>words) throws FileNotFoundException
    {
        Scanner file = new Scanner(new File(fileName));
        while (file.hasNext())
        {   
            String word = file.next();
            Integer count = words.get(word);
            if (count !=null)
                count++;
            else
                count = 1;
            words.put(word, count);
        }
        file.close();
    }  
    public static void main(String[] args) throws FileNotFoundException
    {
        Map<String, Integer> words = new HashMap<String, Integer>();
        countEachWrds("C:\Users\user\Documents\wordCounter.txt", words);
        System.out.println(words);
    }
}

Answer 1

将 Scanner 中的分隔符更改为 Scanner file = new Scanner(new File(fileName)).useDelimiter("\W+") 应该可以解决此问题。

但是，还有一些需要改进的地方：

将方法 countEachWrds 重构为 return 新地图并接受唯一的扫描仪参数。
在创建扫描器实例时在main方法中使用try-with-resources并且不要在countEachWrds中关闭它，因为这不是它的责任
使用Map::merge以更简洁的方式填充词频图。
创建 LinkedHashMap 的实例以保持插入顺序。

话虽如此，重构方法可能如下所示

static Map<String, Integer> countWords(Scanner scanner) {
    Map<String, Integer> words = new LinkedHashMap<>();
    while (scanner.hasNext()) {
        words.merge(scanner.next(), 1, Integer::sum);
    }
    return words;
}

// calling from main
public static void main(String[] args) throws FileNotFoundException {
    try (Scanner scanner = new Scanner(new File("sample.txt")).useDelimiter("\W+")) {
        Map<String, Integer> map = countWords(scanner);
        System.out.println(map);
    }
}

输入文件：

This is a test.
Testing.
How test so?

输出：

{This=1, is=1, a=1, test=2, Testing=1, How=1, so=1}

忽略特殊字符的单词使用计数器

Word usage counter that ignores special characters

java

special-characters