Java 将文件读入列表
Java reading a file into a list
我需要找到一种方法将输入文件转换为由多个字符分隔的句子列表,或者更具体地说,句号和感叹号 (! or .)
我的输入文件的布局类似于:
Sample textfile!
A man, l, a ballot, a catnip, a pooh, a rail, a calamus, a dairyman, a bater, a canal - Panama!
This is a sentence! This one also.
Heres another one?
Yes another one.
如何将该文件逐句放入列表中?
一旦 ! 或 . 字符通过,我文件中的每个句子就完成了。
有很多方法可以完成您的要求,但这里有一种方法可以将文件读入程序,并通过特定的分隔符将每一行拆分成一个列表,同时仍将分隔符保留在句子中.
可以在 turnSentencesToList() 方法中找到基于多个定界符将文件转换为列表的所有功能
在下面的示例中,我拆分为:! . ?
import java.io.File;
import java.io.FileNotFoundException;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test{
public static void main(String [] args){
LinkedList<String> list = turnSentencesToList("sampleFile.txt");
for(String s: list)
System.out.println(s);
}
private static LinkedList<String> turnSentencesToList(String fileName) {
LinkedList<String> list = new LinkedList<>();
String regex = "\.|!|\?";
File file = new File(fileName);
Scanner scan = null;
try {
scan = new Scanner(file);
while(scan.hasNextLine()){
String line = scan.nextLine().trim();
String[] sentences = null;
//we don't need empty lines
if(!line.equals("")) {
//splits by . or ! or ?
sentences = line.split("\.|!|\?");
//gather delims because split() removes them
List<String> delims = getDelimiters(line, regex);
if(sentences!=null) {
int count = 0;
for(String s: sentences) {
list.add(s.trim()+delims.get(count));
count++;
}
}
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
return null;
}finally {
if(scan!=null)
scan.close();
}
return list;
}
private static List<String> getDelimiters(String line, String regex) {
//this method is used to provide a list of all found delimiters in a line
List<String> allDelims = new LinkedList<String>();
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(line);
String delim = null;
while(matcher.find()) {
delim = matcher.group();
allDelims.add(delim);
}
return allDelims;
}
}
根据您的示例输入文件,生成的输出为:
Sample textfile!
A man, l, a ballot, a catnip, a pooh, a rail, a calamus, a dairyman, a bater, a canal - Panama!
This is a sentence!
This one also.
Heres another one?
Yes another one.
我需要找到一种方法将输入文件转换为由多个字符分隔的句子列表,或者更具体地说,句号和感叹号 (! or .)
我的输入文件的布局类似于:
Sample textfile!
A man, l, a ballot, a catnip, a pooh, a rail, a calamus, a dairyman, a bater, a canal - Panama!
This is a sentence! This one also.
Heres another one?
Yes another one.
如何将该文件逐句放入列表中?
一旦 ! 或 . 字符通过,我文件中的每个句子就完成了。
有很多方法可以完成您的要求,但这里有一种方法可以将文件读入程序,并通过特定的分隔符将每一行拆分成一个列表,同时仍将分隔符保留在句子中.
可以在 turnSentencesToList() 方法中找到基于多个定界符将文件转换为列表的所有功能
在下面的示例中,我拆分为:! . ?
import java.io.File;
import java.io.FileNotFoundException;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test{
public static void main(String [] args){
LinkedList<String> list = turnSentencesToList("sampleFile.txt");
for(String s: list)
System.out.println(s);
}
private static LinkedList<String> turnSentencesToList(String fileName) {
LinkedList<String> list = new LinkedList<>();
String regex = "\.|!|\?";
File file = new File(fileName);
Scanner scan = null;
try {
scan = new Scanner(file);
while(scan.hasNextLine()){
String line = scan.nextLine().trim();
String[] sentences = null;
//we don't need empty lines
if(!line.equals("")) {
//splits by . or ! or ?
sentences = line.split("\.|!|\?");
//gather delims because split() removes them
List<String> delims = getDelimiters(line, regex);
if(sentences!=null) {
int count = 0;
for(String s: sentences) {
list.add(s.trim()+delims.get(count));
count++;
}
}
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
return null;
}finally {
if(scan!=null)
scan.close();
}
return list;
}
private static List<String> getDelimiters(String line, String regex) {
//this method is used to provide a list of all found delimiters in a line
List<String> allDelims = new LinkedList<String>();
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(line);
String delim = null;
while(matcher.find()) {
delim = matcher.group();
allDelims.add(delim);
}
return allDelims;
}
}
根据您的示例输入文件,生成的输出为:
Sample textfile!
A man, l, a ballot, a catnip, a pooh, a rail, a calamus, a dairyman, a bater, a canal - Panama!
This is a sentence!
This one also.
Heres another one?
Yes another one.