将程序的输出写入文件
writing the output of a program into a file
我编写了一个程序来将 pdf 解析为文本。我在控制台中得到输出,但无法将其写入文件。这是我完成的代码:
public class PDFTextParser {
public static void main(String args[]) throws IOException {
PDFTextStripper pdfStripper = null;
COSDocument cosDoc = null;
try {
File file = new File("1.pdf");
PDDocument pdDoc = PDDocument.load(file);
pdfStripper = new PDFTextStripper();
String parsedText = pdfStripper.getText(pdDoc);
System.out.println(parsedText);
FileWriter out = new FileWriter("output.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String line = in.readLine();
while (line!= null) {
out.append(line);
out.append("\n");
}
out.close();
}catch (IOException e) {
e.printStackTrace();}
}
}
输出是:
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (6:0) at offset 1013093 does not end with 'endobj' but with '7'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (7:0) at offset 1013211 does not end with 'endobj' but with '483'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (9:0) at offset 1020280 does not end with 'endobj' but with '10'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (10:0) at offset 1020396 does not end with 'endobj' but with '15'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (15:0) at offset 1020519 does not end with 'endobj' but with '16'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (16:0) at offset 1020640 does not end with 'endobj' but with '17'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (17:0) at offset 1020756 does not end with 'endobj' but with '18'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (18:0) at offset 1020874 does not end with 'endobj' but with '19'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (19:0) at offset 1020993 does not end with 'endobj' but with '24'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (24:0) at offset 1021111 does not end with 'endobj' but with '25'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (25:0) at offset 1021228 does not end with 'endobj' but with '26'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (26:0) at offset 1021350 does not end with 'endobj' but with '27'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (27:0) at offset 1021469 does not end with 'endobj' but with '28'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (28:0) at offset 1021589 does not end with 'endobj' but with '489'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (458:0) at offset 1026684 does not end with 'endobj' but with '463'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (463:0) at offset 1026809 does not end with 'endobj' but with '464'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (464:0) at offset 1026932 does not end with 'endobj' but with '465'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (465:0) at offset 1027050 does not end with 'endobj' but with '466'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (466:0) at offset 1027170 does not end with 'endobj' but with '495'
并且解析后的 pdf 文本出现在控制台中..但我得到一个空文件作为输出
你检查过这个 post 了吗? system-out-to-a-file-in-java
不过我喜欢他的第一个回答
java -jar myjar.jar > output.txt
在你的情况下会像
java -cp <classpath>/PDFTextParser > output.txt
希望对您有所帮助
您已经从 PDF 中获取了文本,只需将其写入文件即可,
其余代码尝试从用户那里获取输入(例如,键盘)
你不需要它,只需使用下面的代码:
String parsedText = pdfStripper.getText(pdDoc);
System.out.println(parsedText);
FileWriter out = new FileWriter("output.txt");
out.append(parsedText);
out.close();
//no need for this code, it reads input from user (using keyboard)
/*
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String line = in.readLine();
while (line!= null) {
out.append(line);
out.append("\n");
}
out.close();
*/
我编写了一个程序来将 pdf 解析为文本。我在控制台中得到输出,但无法将其写入文件。这是我完成的代码:
public class PDFTextParser {
public static void main(String args[]) throws IOException {
PDFTextStripper pdfStripper = null;
COSDocument cosDoc = null;
try {
File file = new File("1.pdf");
PDDocument pdDoc = PDDocument.load(file);
pdfStripper = new PDFTextStripper();
String parsedText = pdfStripper.getText(pdDoc);
System.out.println(parsedText);
FileWriter out = new FileWriter("output.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String line = in.readLine();
while (line!= null) {
out.append(line);
out.append("\n");
}
out.close();
}catch (IOException e) {
e.printStackTrace();}
}
}
输出是:
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (6:0) at offset 1013093 does not end with 'endobj' but with '7'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (7:0) at offset 1013211 does not end with 'endobj' but with '483'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (9:0) at offset 1020280 does not end with 'endobj' but with '10'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (10:0) at offset 1020396 does not end with 'endobj' but with '15'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (15:0) at offset 1020519 does not end with 'endobj' but with '16'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (16:0) at offset 1020640 does not end with 'endobj' but with '17'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (17:0) at offset 1020756 does not end with 'endobj' but with '18'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (18:0) at offset 1020874 does not end with 'endobj' but with '19'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (19:0) at offset 1020993 does not end with 'endobj' but with '24'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (24:0) at offset 1021111 does not end with 'endobj' but with '25'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (25:0) at offset 1021228 does not end with 'endobj' but with '26'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (26:0) at offset 1021350 does not end with 'endobj' but with '27'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (27:0) at offset 1021469 does not end with 'endobj' but with '28'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (28:0) at offset 1021589 does not end with 'endobj' but with '489'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (458:0) at offset 1026684 does not end with 'endobj' but with '463'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (463:0) at offset 1026809 does not end with 'endobj' but with '464'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (464:0) at offset 1026932 does not end with 'endobj' but with '465'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (465:0) at offset 1027050 does not end with 'endobj' but with '466'
Apr 07, 2016 2:04:10 PM org.apache.pdfbox.pdfparser.COSParser parseFileObject
WARNING: Object (466:0) at offset 1027170 does not end with 'endobj' but with '495'
并且解析后的 pdf 文本出现在控制台中..但我得到一个空文件作为输出
你检查过这个 post 了吗? system-out-to-a-file-in-java
不过我喜欢他的第一个回答
java -jar myjar.jar > output.txt
在你的情况下会像
java -cp <classpath>/PDFTextParser > output.txt
希望对您有所帮助
您已经从 PDF 中获取了文本,只需将其写入文件即可, 其余代码尝试从用户那里获取输入(例如,键盘) 你不需要它,只需使用下面的代码:
String parsedText = pdfStripper.getText(pdDoc);
System.out.println(parsedText);
FileWriter out = new FileWriter("output.txt");
out.append(parsedText);
out.close();
//no need for this code, it reads input from user (using keyboard)
/*
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String line = in.readLine();
while (line!= null) {
out.append(line);
out.append("\n");
}
out.close();
*/