如何将正则表达式应用于整个文件，而不仅仅是一行接一行？

Question

我不仅要将正则表达式应用到文本文件的第一行，还要应用到所有行。目前，它仅在整个适当的匹配项位于一行时才匹配。如果适当的匹配在下一行继续 - 它根本不匹配。

 class Parser {
  public static void main(String[] args) throws IOException {

    Pattern patt = Pattern.compile("(include|"
            + "integrate|"
            + "driven based on|"
            + "facilitate through|"
            + "contain|"
            + "using|"
            + "equipped"
            + "integrate|"
            + "implement|"
            + "utilized to facilitate|"
            + "comprise){1}"
            + "[\s\w\,\(\)\;\:]*\.");  //Regex
    BufferedReader r = new BufferedReader(new FileReader("E:/test/test.txt")); // read the file


    String line;
    PrintWriter pWriter = null; 
    while ((line = r.readLine()) != null) {           
      Matcher matcher = patt.matcher(line);  
     while (matcher.find()) { 

         try{
             pWriter = new PrintWriter(new BufferedWriter(new FileWriter("E:/test/test1.txt", true)));//append any given input 
             pWriter.println(matcher.group());  //write the result of matcher to the new file
         } catch (IOException ioe) { 
             ioe.printStackTrace(); 
         } finally { 
             if (pWriter != null){ 
                 pWriter.flush(); 

                 pWriter.close(); 
             } 
         }

        System.out.println(matcher.group());   

      }
    }
  }
}

Answer 1

目前匹配器是按行应用的，它需要应用到整个文件才能按预期工作。

正则表达式是贪婪的，你将在第一次匹配时匹配整个字符串，除非你的字符串中有 .（或其他特殊字符）：

...
        + "comprise){1}"
        + "[\s\w\,\(\)\;\:]*\.");  //Regex

在最后一行你匹配任何空格和单词，除了 .。 {1} 和大部分 \ 都是多余的（因为在 [] 中）：

...
        + "comprise)"
        + "[\s\w,();:]*\.");  //Regex

如果您不关心换行符，只需先将其删除即可，它应该可以工作（如果您有 "com\nprise" 之类的内容并想匹配它，我认为没有办法解决它）：

s = s.replaceAll("\n+", "");

Answer 2

将while ((line = r.readLine()) != null)改为：

String file = ""; // Basically, a conglomerate of all of the lines in the file
while ((line = r.readLine()) != null) {
    file += line; // Append each line to the "file" string
}
Matcher matcher = patt.matcher(file);
while (matcher.find()) {
    /* Blah blah blah, your outputting goes here. */
}

之所以会出现这种情况，是因为您要单独处理每一行。对于你想要的，你需要将正则表达式一次性应用到文件。

如何将正则表达式应用于整个文件，而不仅仅是一行接一行？

How to apply regex to entire file, not just line after line?

java

regex

matcher