如何将文本文件与多个正则表达式模式进行匹配并计算这些模式的出现次数?
How to match the text file against multiple regex patterns and count the number of occurences of these patterns?
我想分别查找并统计文本文件每一行中单元、设备、方法、模块的所有出现次数。这就是我所做的,但我不知道如何使用多个模式以及如何分别计算行中每个单词的出现次数?现在它只计算每一行所有单词的出现次数。提前致谢!
private void countPaterns() throws IOException {
Pattern nom = Pattern.compile("unit|device|method|module|material|process|system");
String str = null;
BufferedReader r = new BufferedReader(new FileReader("D:/test/test1.txt"));
while ((str = r.readLine()) != null) {
Matcher matcher = nom.matcher(str);
int countnomen = 0;
while (matcher.find()) {
countnomen++;
}
//intList.add(countnomen);
System.out.println(countnomen + " davon ist das Wort System");
}
r.close();
//return intList;
}
最好使用单词边界并使用地图来记录每个匹配关键字的数量。
Pattern nom = Pattern.compile("\b(unit|device|method|module|material|process|system)\b");
String str = null;
BufferedReader r = new BufferedReader(new FileReader("D:/test/test1.txt"));
Map<String, Integer> counts = new HashMap<>();
while ((str = r.readLine()) != null) {
Matcher matcher = nom.matcher(str);
while (matcher.find()) {
String key = matcher.group(1);
int c = 0;
if (counts.containsKey(key))
c = counts.get(key);
counts.put(key, c+1)
}
}
r.close();
System.out.println(counts);
这是一个 Java 9(及以上)解决方案:
public static void main(String[] args) {
List<String> expressions = List.of("(good)", "(bad)");
String phrase = " good bad bad good good bad bad bad";
for (String regex : expressions) {
Pattern gPattern = Pattern.compile(regex);
Matcher matcher = gPattern.matcher(phrase);
long count = matcher.results().count();
System.out.println("Pattern \"" + regex + "\" appears " + count + (count == 1 ? " time" : " times"));
}
}
产出
Pattern "(good)" appears 3 times
Pattern "(bad)" appears 5 times
我想分别查找并统计文本文件每一行中单元、设备、方法、模块的所有出现次数。这就是我所做的,但我不知道如何使用多个模式以及如何分别计算行中每个单词的出现次数?现在它只计算每一行所有单词的出现次数。提前致谢!
private void countPaterns() throws IOException {
Pattern nom = Pattern.compile("unit|device|method|module|material|process|system");
String str = null;
BufferedReader r = new BufferedReader(new FileReader("D:/test/test1.txt"));
while ((str = r.readLine()) != null) {
Matcher matcher = nom.matcher(str);
int countnomen = 0;
while (matcher.find()) {
countnomen++;
}
//intList.add(countnomen);
System.out.println(countnomen + " davon ist das Wort System");
}
r.close();
//return intList;
}
最好使用单词边界并使用地图来记录每个匹配关键字的数量。
Pattern nom = Pattern.compile("\b(unit|device|method|module|material|process|system)\b");
String str = null;
BufferedReader r = new BufferedReader(new FileReader("D:/test/test1.txt"));
Map<String, Integer> counts = new HashMap<>();
while ((str = r.readLine()) != null) {
Matcher matcher = nom.matcher(str);
while (matcher.find()) {
String key = matcher.group(1);
int c = 0;
if (counts.containsKey(key))
c = counts.get(key);
counts.put(key, c+1)
}
}
r.close();
System.out.println(counts);
这是一个 Java 9(及以上)解决方案:
public static void main(String[] args) {
List<String> expressions = List.of("(good)", "(bad)");
String phrase = " good bad bad good good bad bad bad";
for (String regex : expressions) {
Pattern gPattern = Pattern.compile(regex);
Matcher matcher = gPattern.matcher(phrase);
long count = matcher.results().count();
System.out.println("Pattern \"" + regex + "\" appears " + count + (count == 1 ? " time" : " times"));
}
}
产出
Pattern "(good)" appears 3 times
Pattern "(bad)" appears 5 times