计算文件中每个字符的个数

Counting the number of each character in a file

我正在逐个字符地读取文本文件的内容,然后按升序对它们进行排序并计算每个字符出现的次数。当我 运行 程序时,我的数字偏离了,例如文件中有 7 'A',但我得到 17。我认为这意味着我的计数有问题,或者我正在阅读字符的方式。有什么问题吗?

public class CharacterCounts {

    public static void main(String[] args) throws IOException{
        String fileName = args[0];
        BufferedReader in = new BufferedReader(new FileReader(new File(fileName)));
        ArrayList<Character> vals = new ArrayList<Character>();
        ArrayList<Integer> valCounts = new ArrayList<Integer>();

        while(in.read() != -1){
            vals.add((char)in.read());
        }

        Collections.sort(vals);

        //This counts how many times each char occures,
        //resets count to 0 upon finding a new char.
        int count = 0;
        for(int i = 1; i < vals.size(); i++){
            if(vals.get(i - 1) == vals.get(i)){
                count++;
            } else {
                valCounts.add(count + 1);
                count = 0;
            }
        }

        //Removes duplicates from vals by moving from set then back to ArrayList
        Set<Character> hs = new HashSet<Character>();
        hs.addAll(vals);
        vals.clear();
        vals.addAll(hs);

        //System.out.print(vals.size() + "," + valCounts.size());

        for(int i = 0; i < vals.size(); i++){
            //System.out.println(vals.get(i));
            System.out.printf("'%c' %d\n", vals.get(i), valCounts.get(i));
        }

    }
}

当你写

if(vals.get(i - 1) == vals.get(i)){

两者是完全不同的引用,而且完全不等同。你必须比较它们的价值。

你想要

if(vals.get(i - 1).equals(vals.get(i))){

我认为您的计数逻辑过于复杂了。此外,您在循环中调用 read() 两次,因此您将跳过所有其他值。

int[] counts = new int[256]; // for each byte value
int i;
while ((i = in.read()) != -1) { // Note you should only be calling read once for each value
    counts[i]++;
}


System.out.println(counts['a']);

为什么不使用正则表达式呢,代码会更加灵活简单。看看下面的代码:

 ...
 final BufferedReader reader = new BufferedReader(new FileReader(filename));
 final StringBuilder contents = new StringBuilder();
 //read content in a string builder
 while(reader.ready()) {
    contents.append(reader.readLine());
 }
 reader.close();

 Map<Character,Integer> report = new TreeMap<>();
 //init a counter    
 int count = 0;
 //Iterate the chars from 'a' to 'z'
 for(char a = 'a';a <'z'; a++ ){
     String c = Character.toString(a);

      //skip not printable char
      if(c.matches("\W"))
          continue;

      String C = c.toUpperCase();
      //match uppercase and lowercase char
      Pattern pattern = Pattern.compile("[" + c + C +"]", Pattern.MULTILINE);
      Matcher m = pattern.matcher(contents.toString());
         while(m.find()){
             count++;
         }

         if(count>0){
             report.put(a, count); 
         }
         //reset the counter
         count=0;

      }

      System.out.println(report);
     ...