无法计算 java 中使用 pdfbox 的广告字符
cant count works ad character with pdfbox in java
class ReadPDF {
public void Read() throws IOException {
int amountOfWords = 0;
int amountOfChars = 0;
String sourceCode ="";
try {
PDDocument doc = PDDocument.load(new File("C:\Users\ccw\Desktop\articles\RECYCLING-BEHAVIOUR-AMONG-MALAYSIAN-TERTIARY-STUDENTS.pdf"));
String text = new PDFTextStripper().getText(doc);
sourceCode = sourceCode.replace ("-", "").replace (".", "");
while(doc!=null){
String[] words = sourceCode.split(" ");
amountOfWords = amountOfWords + words.length;
for (String word : words) {
amountOfChars = amountOfChars + word.length();
}
}
System.out.println("Amount of Chars is " + amountOfChars);
System.out.println("Amount of Words is " + (amountOfWords + 1));
System.out.println("Average Word Length is "+ (amountOfChars/amountOfWords));
}catch (IOException e) {
System.out.println(e);
}
}
}
我正在尝试使用 pdfbox 计算 pdf 文件中的所有单词和字符。
但是现在我得到一个错误,源代码没有初始化
将这一行 sourceCode = sourceCode.replace ("-", "").replace (".", "");
替换为 sourceCode = text.replace ("-", "").replace (".", "");
并删除 while 循环
class ReadPDF {
public void Read() throws IOException {
int amountOfWords = 0;
int amountOfChars = 0;
String sourceCode ="";
try {
PDDocument doc = PDDocument.load(new File("C:\Users\ccw\Desktop\articles\RECYCLING-BEHAVIOUR-AMONG-MALAYSIAN-TERTIARY-STUDENTS.pdf"));
String text = new PDFTextStripper().getText(doc);
sourceCode = sourceCode.replace ("-", "").replace (".", "");
while(doc!=null){
String[] words = sourceCode.split(" ");
amountOfWords = amountOfWords + words.length;
for (String word : words) {
amountOfChars = amountOfChars + word.length();
}
}
System.out.println("Amount of Chars is " + amountOfChars);
System.out.println("Amount of Words is " + (amountOfWords + 1));
System.out.println("Average Word Length is "+ (amountOfChars/amountOfWords));
}catch (IOException e) {
System.out.println(e);
}
}
}
我正在尝试使用 pdfbox 计算 pdf 文件中的所有单词和字符。 但是现在我得到一个错误,源代码没有初始化
将这一行 sourceCode = sourceCode.replace ("-", "").replace (".", "");
替换为 sourceCode = text.replace ("-", "").replace (".", "");
并删除 while 循环