打印 txt 文件中单词出现的次数
Printing the number of times the word appears from a txt file
我正在尝试查找单词 "the" 在 txt 文件中出现的次数。使用下面的代码,当它应该是 4520 时,我一直将 0 作为我的输出。我使用定界符来分隔 "the",但它似乎根本不计算它。当我使用 "[^a-zA-Z]+"
.
计算所有单词时,分隔符起作用
in.useDelimiter("[^the]+");
while (in.hasNext()) {
String words = in.next();
words = words.toLowerCase();
wordCount++;
}
System.out.println("The total number of 'the' is " + theWord);
在Java9+中,您可以统计一个单词在文本文件中出现的次数,如下所示:
static long countWord(String filename, String word) throws IOException {
Pattern p = Pattern.compile("\b" + Pattern.quote(word) + "\b", Pattern.CASE_INSENSITIVE);
return Files.lines(Paths.get(filename)).flatMap(s -> p.matcher(s).results()).count();
}
测试
System.out.println(countWord("test.txt", "the"));
test.txt
The quick brown fox
jumps over the lazy dog
输出
2
Java 8 版本:
static int countWord(String filename, String word) throws IOException {
Pattern p = Pattern.compile("\b" + Pattern.quote(word) + "\b", Pattern.CASE_INSENSITIVE);
return Files.lines(Paths.get(filename)).mapToInt(s -> {
int count = 0;
for (Matcher m = p.matcher(s); m.find(); )
count++;
return count;
}).sum();
}
Java 7 版本:
static int countWord(String filename, String word) throws IOException {
Pattern p = Pattern.compile("\b" + Pattern.quote(word) + "\b", Pattern.CASE_INSENSITIVE);
int count = 0;
try (BufferedReader in = Files.newBufferedReader(Paths.get(filename), StandardCharsets.UTF_8)) {
for (String line; (line = in.readLine()) != null; )
for (Matcher m = p.matcher(line); m.find(); )
count++;
}
return count;
}
更新
Java 7+ 版本的完整代码,没有使用方法,并且使用速度慢得多 Scanner
,因为 OP 似乎有麻烦 copy/pasting 上面的方法进入他们的代码。
import java.io.File;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) throws Exception {
int count = 0;
try (Scanner in = new Scanner(new File("test.txt"))) {
Pattern p = Pattern.compile("\bthe\b", Pattern.CASE_INSENSITIVE);
while (in.hasNextLine())
for (Matcher m = p.matcher(in.nextLine()); m.find(); )
count++;
}
System.out.println("The total number of 'the' is " + count);
}
}
为了比较,使用此答案中第一种方法的完整版本为:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) throws IOException {
System.out.println("The total number of 'the' is " + countWord("test.txt", "the"));
}
static long countWord(String filename, String word) throws IOException {
Pattern p = Pattern.compile("\b" + Pattern.quote(word) + "\b", Pattern.CASE_INSENSITIVE);
return Files.lines(Paths.get(filename)).flatMap(s -> p.matcher(s).results()).count();
}
}
使用\b(?i)(the)\b
作为正则表达式,其中\b
代表单词边界,i
代表不区分大小写,(the)
代表the
作为所有的。请注意,[]
检查它所包含的单个字符,而不是整个所包含的文本。
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
Scanner in = null;
try {
in = new Scanner(new File("file.txt"));
int wordCount = 0, len;
while (in.hasNextLine()) {
len = in.nextLine().split("\b(?i)(the)\b").length;
wordCount = len == 0 ? wordCount + 1 : wordCount + len - 1;
}
in.close();
System.out.println("The total number of 'the' is " + wordCount);
} catch (FileNotFoundException e) {
System.out.println("File does not exist");
}
}
}
输出:
The total number of 'the' is 5
file.txt的内容:
The cat jumped over the rat.
The is written as THE in capital letter.
He gave them the sword.
我正在尝试查找单词 "the" 在 txt 文件中出现的次数。使用下面的代码,当它应该是 4520 时,我一直将 0 作为我的输出。我使用定界符来分隔 "the",但它似乎根本不计算它。当我使用 "[^a-zA-Z]+"
.
in.useDelimiter("[^the]+");
while (in.hasNext()) {
String words = in.next();
words = words.toLowerCase();
wordCount++;
}
System.out.println("The total number of 'the' is " + theWord);
在Java9+中,您可以统计一个单词在文本文件中出现的次数,如下所示:
static long countWord(String filename, String word) throws IOException {
Pattern p = Pattern.compile("\b" + Pattern.quote(word) + "\b", Pattern.CASE_INSENSITIVE);
return Files.lines(Paths.get(filename)).flatMap(s -> p.matcher(s).results()).count();
}
测试
System.out.println(countWord("test.txt", "the"));
test.txt
The quick brown fox
jumps over the lazy dog
输出
2
Java 8 版本:
static int countWord(String filename, String word) throws IOException {
Pattern p = Pattern.compile("\b" + Pattern.quote(word) + "\b", Pattern.CASE_INSENSITIVE);
return Files.lines(Paths.get(filename)).mapToInt(s -> {
int count = 0;
for (Matcher m = p.matcher(s); m.find(); )
count++;
return count;
}).sum();
}
Java 7 版本:
static int countWord(String filename, String word) throws IOException {
Pattern p = Pattern.compile("\b" + Pattern.quote(word) + "\b", Pattern.CASE_INSENSITIVE);
int count = 0;
try (BufferedReader in = Files.newBufferedReader(Paths.get(filename), StandardCharsets.UTF_8)) {
for (String line; (line = in.readLine()) != null; )
for (Matcher m = p.matcher(line); m.find(); )
count++;
}
return count;
}
更新
Java 7+ 版本的完整代码,没有使用方法,并且使用速度慢得多 Scanner
,因为 OP 似乎有麻烦 copy/pasting 上面的方法进入他们的代码。
import java.io.File;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) throws Exception {
int count = 0;
try (Scanner in = new Scanner(new File("test.txt"))) {
Pattern p = Pattern.compile("\bthe\b", Pattern.CASE_INSENSITIVE);
while (in.hasNextLine())
for (Matcher m = p.matcher(in.nextLine()); m.find(); )
count++;
}
System.out.println("The total number of 'the' is " + count);
}
}
为了比较,使用此答案中第一种方法的完整版本为:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) throws IOException {
System.out.println("The total number of 'the' is " + countWord("test.txt", "the"));
}
static long countWord(String filename, String word) throws IOException {
Pattern p = Pattern.compile("\b" + Pattern.quote(word) + "\b", Pattern.CASE_INSENSITIVE);
return Files.lines(Paths.get(filename)).flatMap(s -> p.matcher(s).results()).count();
}
}
使用\b(?i)(the)\b
作为正则表达式,其中\b
代表单词边界,i
代表不区分大小写,(the)
代表the
作为所有的。请注意,[]
检查它所包含的单个字符,而不是整个所包含的文本。
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
Scanner in = null;
try {
in = new Scanner(new File("file.txt"));
int wordCount = 0, len;
while (in.hasNextLine()) {
len = in.nextLine().split("\b(?i)(the)\b").length;
wordCount = len == 0 ? wordCount + 1 : wordCount + len - 1;
}
in.close();
System.out.println("The total number of 'the' is " + wordCount);
} catch (FileNotFoundException e) {
System.out.println("File does not exist");
}
}
}
输出:
The total number of 'the' is 5
file.txt的内容:
The cat jumped over the rat.
The is written as THE in capital letter.
He gave them the sword.