有没有办法使用正则表达式处理包含主标题的文本文件中的数据?

Is there a way to process data from a text file containing main headings using a regular expression?

下面是文本文件格式结构的片段

Historical Sales for: 12th of October  2019, 11:37 am

PRODUCT NAME      QUANTITY
Coke B            5

Historical Sales for: 21st of October  2019, 8:15 pm

PRODUCT NAME      QUANTITY
Peanuts           2

我只想处理列标签和行值,但不包括主标题;在这种情况下,历史销售额:2019 年 10 月 12 日,11:37 上午

这是我编写的使用正则表达式处理文本的代码(\b)

        StringBuilder temporary = new StringBuilder();
   
        InputStream inputStream = new FileInputStream(new File(FILE_NAME));            
        BufferedReader readFile = new BufferedReader(new InputStreamReader(inputStream));
        
        String next; 
        
        while ((next = readFile.readLine()) != null) {
           temporary.append(next).append("\n");
        }   

        next = String.format("%13s", ""); // spacing for column headers          
        System.out.println(temporary.toString().replaceAll("(\b)", next));

如果您只想打印以下行:

PRODUCT NAME      QUANTITY
Chips             2
Coke B            5

和相似之处。我建议您使用 Java 8 个流并使用下面的正则表达式删除不需要的行:

public static void main(String[] args) throws Exception {
    String collect = Files.lines(Paths.get("file.txt"))
            .filter(line -> !line.matches("^Historical Sales for.*$") && !line.matches("^\s*$"))
            .map(line -> line+="\n")
            .collect(Collectors.joining());
    System.out.println(collect);
}

这样你将拥有:

PRODUCT NAME      QUANTITY
Chips             2
Coke B            5
PRODUCT NAME      QUANTITY
(...)

使用 Streams 的一个优点是 .collect() 方法允许您将字符串直接解析为 List.

如果你想保留你的例子,你可以这样做:

StringBuilder temporaryData = new StringBuilder();

InputStream inputStream = new FileInputStream(new File("file.txt"));
BufferedReader readFile = new BufferedReader(new InputStreamReader(inputStream));

String next;

while ((next = readFile.readLine()) != null) {
    temporaryData.append(next).append("\n");
}

next = String.format("%13s", ""); // spacing for column headers
String formattedString = temporaryData.toString().replaceAll("(\b{3})", next);
String stringWithoutHeaders = formattedString.replaceAll("^Historical Sales for.*$", "").replaceAll("^\s*$", "");
System.out.println(stringWithoutHeaders);