TreeMap 的 containsKey 方法 returns false 尽管键已经在 Map 中
containsKey method of TreeMap returns false despite that the key is already in the Map
我尝试编写一个程序来计算文本文件中的所有单词。
我在 TreeMap 中输入了与模式匹配的任何单词。
我通过args0
得到的文本文件
例如,文本文件包含以下文本:The Project Gutenberg EBook of The Complete Works of William Shakespeare
检查TreeMap是否已经有单词return false
第二次出现单词The
的条件,但是returns true
单词of
.
的第二次出现
我不明白为什么...
这是我的代码:
public class WordCount
{
public static void main(String[] args)
{
// Charset charset = Charset.forName("UTF-8");
// Locale locale = new Locale("en", "US");
Path p0 = Paths.get(args[0]);
Path p1 = Paths.get(args[1]);
Path p2 = Paths.get(args[2]);
Pattern pattern1 = Pattern.compile("[a-zA-Z]");
Matcher matcher;
Pattern pattern2 = Pattern.compile("'.");
Map<String, Integer> alphabetical = new TreeMap<String, Integer>();
try (BufferedReader reader = Files.newBufferedReader(p0))
{
String line = null;
while ((line = reader.readLine()) != null)
{
// System.out.println(line);
for (String word : line.split("\s"))
{
boolean found = false;
matcher = pattern1.matcher(word);
while (matcher.find())
{
found = true;
}
if (found)
{
boolean check = alphabetical.containsKey(word.toLowerCase());
if (!alphabetical.containsKey(word.toLowerCase()))
alphabetical.put(word.toLowerCase(), 1);
else
alphabetical.put(word.toLowerCase(), alphabetical.get(word.toLowerCase()).intValue() + 1);
}
else
{
matcher = pattern2.matcher(word);
while (matcher.find())
{
found = true;
}
if (found)
{
if (!alphabetical.containsKey(word.substring(1, word.length())))
alphabetical.put(word.substring(1, word.length()).toLowerCase(), 1);
else
alphabetical.put(word.substring(1, word.length()).toLowerCase(), alphabetical.get(word).intValue() + 1);
}
}
}
}
}
我测试了你的代码,没问题。我认为你必须检查你的文件编码。
肯定在"UTF-8"。放入"UTF-8 without BOM",就OK了!
编辑:
如果您无法更改编码,则可以手动进行。看到这个 link :
http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html
此致
我尝试编写一个程序来计算文本文件中的所有单词。 我在 TreeMap 中输入了与模式匹配的任何单词。
我通过args0
例如,文本文件包含以下文本:The Project Gutenberg EBook of The Complete Works of William Shakespeare
检查TreeMap是否已经有单词return false
第二次出现单词The
的条件,但是returns true
单词of
.
我不明白为什么...
这是我的代码:
public class WordCount
{
public static void main(String[] args)
{
// Charset charset = Charset.forName("UTF-8");
// Locale locale = new Locale("en", "US");
Path p0 = Paths.get(args[0]);
Path p1 = Paths.get(args[1]);
Path p2 = Paths.get(args[2]);
Pattern pattern1 = Pattern.compile("[a-zA-Z]");
Matcher matcher;
Pattern pattern2 = Pattern.compile("'.");
Map<String, Integer> alphabetical = new TreeMap<String, Integer>();
try (BufferedReader reader = Files.newBufferedReader(p0))
{
String line = null;
while ((line = reader.readLine()) != null)
{
// System.out.println(line);
for (String word : line.split("\s"))
{
boolean found = false;
matcher = pattern1.matcher(word);
while (matcher.find())
{
found = true;
}
if (found)
{
boolean check = alphabetical.containsKey(word.toLowerCase());
if (!alphabetical.containsKey(word.toLowerCase()))
alphabetical.put(word.toLowerCase(), 1);
else
alphabetical.put(word.toLowerCase(), alphabetical.get(word.toLowerCase()).intValue() + 1);
}
else
{
matcher = pattern2.matcher(word);
while (matcher.find())
{
found = true;
}
if (found)
{
if (!alphabetical.containsKey(word.substring(1, word.length())))
alphabetical.put(word.substring(1, word.length()).toLowerCase(), 1);
else
alphabetical.put(word.substring(1, word.length()).toLowerCase(), alphabetical.get(word).intValue() + 1);
}
}
}
}
}
我测试了你的代码,没问题。我认为你必须检查你的文件编码。
肯定在"UTF-8"。放入"UTF-8 without BOM",就OK了!
编辑: 如果您无法更改编码,则可以手动进行。看到这个 link : http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html
此致