FileInputStream 只读取文件中的第一个单词

FileInputStream only reads the first word in a file

我想逐个标记地读取 file.txt 文件中的单词,并为每个单词添加词性标记并将其写入 file2.text 文件。 file.txt 内容已标记化。所以这是我的代码。

public class PoSTagging {
@SuppressWarnings("resource")
public static void PoStagMethod() throws IOException {

FileInputStream fin= new FileInputStream("C:\Users\dell\Desktop\file.txt");
DataInputStream in = new DataInputStream(fin);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strline=br.readLine();
System.out.println(strline+"first");

try{
POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
POSTaggerME tagger = new POSTaggerME(model);

String input = strline;
@SuppressWarnings("deprecation")
ObjectStream<String> lineStream =new PlainTextByLineStream(new StringReader(input));

perfMon.start();
String line;
while ((line = lineStream.read()) != null) {

    String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
    String[] tags = tagger.tag(whitespaceTokenizerLine);

    POSSample sample = new POSSample(whitespaceTokenizerLine, tags);
    System.out.println(sample.toString()+"second");
    //String t=sample.toString();

    FileOutputStream fout=new FileOutputStream("C:\Users\dell\Desktop\file2.txt");
    //fout.write(t.getBytes());

    perfMon.incrementCounter();
    fout.close();
}
perfMon.stopAndPrintFinalResult();
}
catch (IOException e) {
    e.printStackTrace();
}
}
}

当从另一个 class 调用 PoStagMethod() 时,只有 file.txt 文件中的第一个单词被写入 file2.txt 文件。为什么它不读取文件中的其他单词?我的代码有什么问题?

您可以使用 BufferedReader 逐行阅读 file.txt。然后使用 POSModel 处理每一行,然后使用 BufferedWriter 将输出写入 file2.txt。下面的代码片段可能会有所帮助:

    POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
    PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
    POSTaggerME tagger = new POSTaggerME(model);

    BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter("C:\Users\dell\Desktop\file2.txt"));

    BufferedReader bufferedReader = new BufferedReader(new FileReader("C:\Users\dell\Desktop\file.txt"));
    String line = "";
    while((line = bufferedReader.readLine()) != null){
        String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
        String[] tags = tagger.tag(whitespaceTokenizerLine);
        // Do your work with your tags and tokenized words


        bufferedWriter.write(/* the string which is needed to be written to your output */);
        // for adding new-lines in the output file, uncomment the following line:
        //bufferedWriter.newLine();
    }

    //Do not forget to flush() and close() the streams after your job is done:
    bufferedWriter.flush();
    bufferedWriter.close();
    bufferedReader.close();

如果你能做到这一点,用 try-with-resource 替换老式的 try-catch 子句也不错,它是在 java 1.7 中添加的自动关闭资源。

此外,如果您需要将每个单词及其标签写在不同的行中,您可能需要一个内部循环来写入文件。它会像下面这样:

    POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
    PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
    POSTaggerME tagger = new POSTaggerME(model);

    BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter("C:\Users\dell\Desktop\file2.txt"));

    BufferedReader bufferedReader = new BufferedReader(new FileReader("C:\Users\dell\Desktop\file.txt"));
    String line = "";
    while((line = bufferedReader.readLine()) != null){
        String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
        String[] tags = tagger.tag(whitespaceTokenizerLine);
        for(String word: whitespaceTokenizerLine){

        // Do your work with your tags and tokenized words

        bufferedWriter.write(/* the string which is needed to be written to your output */);
        // for adding new-lines in the output file, uncomment the following line:
        //bufferedWriter.newLine();
        }
    }

    //Do not forget to flush() and close() the streams after your job is done:
    bufferedWriter.flush();
    bufferedWriter.close();
    bufferedReader.close();

希望这会有所帮助,

祝你好运。