有没有办法使用正则表达式处理包含主标题的文本文件中的数据?
Is there a way to process data from a text file containing main headings using a regular expression?
下面是文本文件格式结构的片段
Historical Sales for: 12th of October 2019, 11:37 am
PRODUCT NAME QUANTITY
Coke B 5
Historical Sales for: 21st of October 2019, 8:15 pm
PRODUCT NAME QUANTITY
Peanuts 2
我只想处理列标签和行值,但不包括主标题;在这种情况下,历史销售额:2019 年 10 月 12 日,11:37 上午 。
这是我编写的使用正则表达式处理文本的代码(\b)
StringBuilder temporary = new StringBuilder();
InputStream inputStream = new FileInputStream(new File(FILE_NAME));
BufferedReader readFile = new BufferedReader(new InputStreamReader(inputStream));
String next;
while ((next = readFile.readLine()) != null) {
temporary.append(next).append("\n");
}
next = String.format("%13s", ""); // spacing for column headers
System.out.println(temporary.toString().replaceAll("(\b)", next));
如果您只想打印以下行:
PRODUCT NAME QUANTITY
Chips 2
Coke B 5
和相似之处。我建议您使用 Java 8 个流并使用下面的正则表达式删除不需要的行:
public static void main(String[] args) throws Exception {
String collect = Files.lines(Paths.get("file.txt"))
.filter(line -> !line.matches("^Historical Sales for.*$") && !line.matches("^\s*$"))
.map(line -> line+="\n")
.collect(Collectors.joining());
System.out.println(collect);
}
这样你将拥有:
PRODUCT NAME QUANTITY
Chips 2
Coke B 5
PRODUCT NAME QUANTITY
(...)
使用 Streams 的一个优点是 .collect()
方法允许您将字符串直接解析为 List
.
如果你想保留你的例子,你可以这样做:
StringBuilder temporaryData = new StringBuilder();
InputStream inputStream = new FileInputStream(new File("file.txt"));
BufferedReader readFile = new BufferedReader(new InputStreamReader(inputStream));
String next;
while ((next = readFile.readLine()) != null) {
temporaryData.append(next).append("\n");
}
next = String.format("%13s", ""); // spacing for column headers
String formattedString = temporaryData.toString().replaceAll("(\b{3})", next);
String stringWithoutHeaders = formattedString.replaceAll("^Historical Sales for.*$", "").replaceAll("^\s*$", "");
System.out.println(stringWithoutHeaders);
下面是文本文件格式结构的片段
Historical Sales for: 12th of October 2019, 11:37 am
PRODUCT NAME QUANTITY
Coke B 5
Historical Sales for: 21st of October 2019, 8:15 pm
PRODUCT NAME QUANTITY
Peanuts 2
我只想处理列标签和行值,但不包括主标题;在这种情况下,历史销售额:2019 年 10 月 12 日,11:37 上午 。
这是我编写的使用正则表达式处理文本的代码(\b)
StringBuilder temporary = new StringBuilder();
InputStream inputStream = new FileInputStream(new File(FILE_NAME));
BufferedReader readFile = new BufferedReader(new InputStreamReader(inputStream));
String next;
while ((next = readFile.readLine()) != null) {
temporary.append(next).append("\n");
}
next = String.format("%13s", ""); // spacing for column headers
System.out.println(temporary.toString().replaceAll("(\b)", next));
如果您只想打印以下行:
PRODUCT NAME QUANTITY
Chips 2
Coke B 5
和相似之处。我建议您使用 Java 8 个流并使用下面的正则表达式删除不需要的行:
public static void main(String[] args) throws Exception {
String collect = Files.lines(Paths.get("file.txt"))
.filter(line -> !line.matches("^Historical Sales for.*$") && !line.matches("^\s*$"))
.map(line -> line+="\n")
.collect(Collectors.joining());
System.out.println(collect);
}
这样你将拥有:
PRODUCT NAME QUANTITY
Chips 2
Coke B 5
PRODUCT NAME QUANTITY
(...)
使用 Streams 的一个优点是 .collect()
方法允许您将字符串直接解析为 List
.
如果你想保留你的例子,你可以这样做:
StringBuilder temporaryData = new StringBuilder();
InputStream inputStream = new FileInputStream(new File("file.txt"));
BufferedReader readFile = new BufferedReader(new InputStreamReader(inputStream));
String next;
while ((next = readFile.readLine()) != null) {
temporaryData.append(next).append("\n");
}
next = String.format("%13s", ""); // spacing for column headers
String formattedString = temporaryData.toString().replaceAll("(\b{3})", next);
String stringWithoutHeaders = formattedString.replaceAll("^Historical Sales for.*$", "").replaceAll("^\s*$", "");
System.out.println(stringWithoutHeaders);