计算文件中每个字符的个数
Counting the number of each character in a file
我正在逐个字符地读取文本文件的内容,然后按升序对它们进行排序并计算每个字符出现的次数。当我 运行 程序时,我的数字偏离了,例如文件中有 7 'A',但我得到 17。我认为这意味着我的计数有问题,或者我正在阅读字符的方式。有什么问题吗?
public class CharacterCounts {
public static void main(String[] args) throws IOException{
String fileName = args[0];
BufferedReader in = new BufferedReader(new FileReader(new File(fileName)));
ArrayList<Character> vals = new ArrayList<Character>();
ArrayList<Integer> valCounts = new ArrayList<Integer>();
while(in.read() != -1){
vals.add((char)in.read());
}
Collections.sort(vals);
//This counts how many times each char occures,
//resets count to 0 upon finding a new char.
int count = 0;
for(int i = 1; i < vals.size(); i++){
if(vals.get(i - 1) == vals.get(i)){
count++;
} else {
valCounts.add(count + 1);
count = 0;
}
}
//Removes duplicates from vals by moving from set then back to ArrayList
Set<Character> hs = new HashSet<Character>();
hs.addAll(vals);
vals.clear();
vals.addAll(hs);
//System.out.print(vals.size() + "," + valCounts.size());
for(int i = 0; i < vals.size(); i++){
//System.out.println(vals.get(i));
System.out.printf("'%c' %d\n", vals.get(i), valCounts.get(i));
}
}
}
当你写
if(vals.get(i - 1) == vals.get(i)){
两者是完全不同的引用,而且完全不等同。你必须比较它们的价值。
你想要
if(vals.get(i - 1).equals(vals.get(i))){
我认为您的计数逻辑过于复杂了。此外,您在循环中调用 read()
两次,因此您将跳过所有其他值。
int[] counts = new int[256]; // for each byte value
int i;
while ((i = in.read()) != -1) { // Note you should only be calling read once for each value
counts[i]++;
}
System.out.println(counts['a']);
为什么不使用正则表达式呢,代码会更加灵活简单。看看下面的代码:
...
final BufferedReader reader = new BufferedReader(new FileReader(filename));
final StringBuilder contents = new StringBuilder();
//read content in a string builder
while(reader.ready()) {
contents.append(reader.readLine());
}
reader.close();
Map<Character,Integer> report = new TreeMap<>();
//init a counter
int count = 0;
//Iterate the chars from 'a' to 'z'
for(char a = 'a';a <'z'; a++ ){
String c = Character.toString(a);
//skip not printable char
if(c.matches("\W"))
continue;
String C = c.toUpperCase();
//match uppercase and lowercase char
Pattern pattern = Pattern.compile("[" + c + C +"]", Pattern.MULTILINE);
Matcher m = pattern.matcher(contents.toString());
while(m.find()){
count++;
}
if(count>0){
report.put(a, count);
}
//reset the counter
count=0;
}
System.out.println(report);
...
我正在逐个字符地读取文本文件的内容,然后按升序对它们进行排序并计算每个字符出现的次数。当我 运行 程序时,我的数字偏离了,例如文件中有 7 'A',但我得到 17。我认为这意味着我的计数有问题,或者我正在阅读字符的方式。有什么问题吗?
public class CharacterCounts {
public static void main(String[] args) throws IOException{
String fileName = args[0];
BufferedReader in = new BufferedReader(new FileReader(new File(fileName)));
ArrayList<Character> vals = new ArrayList<Character>();
ArrayList<Integer> valCounts = new ArrayList<Integer>();
while(in.read() != -1){
vals.add((char)in.read());
}
Collections.sort(vals);
//This counts how many times each char occures,
//resets count to 0 upon finding a new char.
int count = 0;
for(int i = 1; i < vals.size(); i++){
if(vals.get(i - 1) == vals.get(i)){
count++;
} else {
valCounts.add(count + 1);
count = 0;
}
}
//Removes duplicates from vals by moving from set then back to ArrayList
Set<Character> hs = new HashSet<Character>();
hs.addAll(vals);
vals.clear();
vals.addAll(hs);
//System.out.print(vals.size() + "," + valCounts.size());
for(int i = 0; i < vals.size(); i++){
//System.out.println(vals.get(i));
System.out.printf("'%c' %d\n", vals.get(i), valCounts.get(i));
}
}
}
当你写
if(vals.get(i - 1) == vals.get(i)){
两者是完全不同的引用,而且完全不等同。你必须比较它们的价值。
你想要
if(vals.get(i - 1).equals(vals.get(i))){
我认为您的计数逻辑过于复杂了。此外,您在循环中调用 read()
两次,因此您将跳过所有其他值。
int[] counts = new int[256]; // for each byte value
int i;
while ((i = in.read()) != -1) { // Note you should only be calling read once for each value
counts[i]++;
}
System.out.println(counts['a']);
为什么不使用正则表达式呢,代码会更加灵活简单。看看下面的代码:
...
final BufferedReader reader = new BufferedReader(new FileReader(filename));
final StringBuilder contents = new StringBuilder();
//read content in a string builder
while(reader.ready()) {
contents.append(reader.readLine());
}
reader.close();
Map<Character,Integer> report = new TreeMap<>();
//init a counter
int count = 0;
//Iterate the chars from 'a' to 'z'
for(char a = 'a';a <'z'; a++ ){
String c = Character.toString(a);
//skip not printable char
if(c.matches("\W"))
continue;
String C = c.toUpperCase();
//match uppercase and lowercase char
Pattern pattern = Pattern.compile("[" + c + C +"]", Pattern.MULTILINE);
Matcher m = pattern.matcher(contents.toString());
while(m.find()){
count++;
}
if(count>0){
report.put(a, count);
}
//reset the counter
count=0;
}
System.out.println(report);
...