Java 将文件读入列表

Java reading a file into a list

我需要找到一种方法将输入文件转换为由多个字符分隔的句子列表,或者更具体地说,句号和感叹号 (! or .)

我的输入文件的布局类似于:

Sample textfile!

A man, l, a ballot, a catnip, a pooh, a rail, a calamus, a dairyman, a bater, a canal - Panama!

This is a sentence! This one also.

Heres another one?

Yes another one.

如何将该文件逐句放入列表中?

一旦 !. 字符通过,我文件中的每个句子就完成了。

有很多方法可以完成您的要求,但这里有一种方法可以将文件读入程序,并通过特定的分隔符将每一行拆分成一个列表,同时仍将分隔符保留在句子中.

可以在 turnSentencesToList() 方法中找到基于多个定界符将文件转换为列表的所有功能

在下面的示例中,我拆分为:! . ?

import java.io.File;
import java.io.FileNotFoundException;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test{
    
    public static void main(String [] args){
        LinkedList<String> list = turnSentencesToList("sampleFile.txt");
        
        for(String s: list)
            System.out.println(s);  
    }
    
    private static LinkedList<String> turnSentencesToList(String fileName) {
        LinkedList<String> list = new LinkedList<>();
        String regex = "\.|!|\?";
        
        File file = new File(fileName);
        Scanner scan = null;
        try {
            scan = new Scanner(file);
            while(scan.hasNextLine()){
                String line = scan.nextLine().trim();
                
                String[] sentences = null;
                //we don't need empty lines
                if(!line.equals("")) {
                    //splits by . or ! or ?
                    sentences = line.split("\.|!|\?");
                    
                    //gather delims because split() removes them
                    List<String> delims = getDelimiters(line, regex);
                    
                    if(sentences!=null) {
                        int count = 0;
                        for(String s: sentences) {
                            list.add(s.trim()+delims.get(count));
                            count++;
                        }
                    }
                }
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
            return null;
        }finally {
            if(scan!=null)
                scan.close();
        }
        return list;
    }
    
    
     private static List<String> getDelimiters(String line, String regex) {
         //this method is used to provide a list of all found delimiters in a line
         List<String> allDelims = new LinkedList<String>();
         Pattern pattern = Pattern.compile(regex);
         Matcher matcher = pattern.matcher(line);
    
         String delim = null;
         while(matcher.find()) {
             delim = matcher.group();
             allDelims.add(delim);
         }
            
         return allDelims;     
    }
}

根据您的示例输入文件,生成的输出为:

Sample textfile!

A man, l, a ballot, a catnip, a pooh, a rail, a calamus, a dairyman, a bater, a canal - Panama!

This is a sentence!

This one also.

Heres another one?

Yes another one.