如何将正则表达式应用于整个文件,而不仅仅是一行接一行?
How to apply regex to entire file, not just line after line?
我不仅要将正则表达式应用到文本文件的第一行,还要应用到所有行。
目前,它仅在整个适当的匹配项位于一行时才匹配。如果适当的匹配在下一行继续 - 它根本不匹配。
class Parser {
public static void main(String[] args) throws IOException {
Pattern patt = Pattern.compile("(include|"
+ "integrate|"
+ "driven based on|"
+ "facilitate through|"
+ "contain|"
+ "using|"
+ "equipped"
+ "integrate|"
+ "implement|"
+ "utilized to facilitate|"
+ "comprise){1}"
+ "[\s\w\,\(\)\;\:]*\."); //Regex
BufferedReader r = new BufferedReader(new FileReader("E:/test/test.txt")); // read the file
String line;
PrintWriter pWriter = null;
while ((line = r.readLine()) != null) {
Matcher matcher = patt.matcher(line);
while (matcher.find()) {
try{
pWriter = new PrintWriter(new BufferedWriter(new FileWriter("E:/test/test1.txt", true)));//append any given input
pWriter.println(matcher.group()); //write the result of matcher to the new file
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
if (pWriter != null){
pWriter.flush();
pWriter.close();
}
}
System.out.println(matcher.group());
}
}
}
}
目前匹配器是按行应用的,它需要应用到整个文件才能按预期工作。
正则表达式是贪婪的,你将在第一次匹配时匹配整个字符串,除非你的字符串中有 .
(或其他特殊字符):
...
+ "comprise){1}"
+ "[\s\w\,\(\)\;\:]*\."); //Regex
在最后一行你匹配任何空格和单词,除了 .
。 {1}
和大部分 \
都是多余的(因为在 []
中):
...
+ "comprise)"
+ "[\s\w,();:]*\."); //Regex
如果您不关心换行符,只需先将其删除即可,它应该可以工作(如果您有 "com\nprise"
之类的内容并想匹配它,我认为没有办法解决它):
s = s.replaceAll("\n+", "");
将while ((line = r.readLine()) != null)
改为:
String file = ""; // Basically, a conglomerate of all of the lines in the file
while ((line = r.readLine()) != null) {
file += line; // Append each line to the "file" string
}
Matcher matcher = patt.matcher(file);
while (matcher.find()) {
/* Blah blah blah, your outputting goes here. */
}
之所以会出现这种情况,是因为您要单独处理每一行。对于你想要的,你需要将正则表达式一次性应用到文件。
我不仅要将正则表达式应用到文本文件的第一行,还要应用到所有行。 目前,它仅在整个适当的匹配项位于一行时才匹配。如果适当的匹配在下一行继续 - 它根本不匹配。
class Parser {
public static void main(String[] args) throws IOException {
Pattern patt = Pattern.compile("(include|"
+ "integrate|"
+ "driven based on|"
+ "facilitate through|"
+ "contain|"
+ "using|"
+ "equipped"
+ "integrate|"
+ "implement|"
+ "utilized to facilitate|"
+ "comprise){1}"
+ "[\s\w\,\(\)\;\:]*\."); //Regex
BufferedReader r = new BufferedReader(new FileReader("E:/test/test.txt")); // read the file
String line;
PrintWriter pWriter = null;
while ((line = r.readLine()) != null) {
Matcher matcher = patt.matcher(line);
while (matcher.find()) {
try{
pWriter = new PrintWriter(new BufferedWriter(new FileWriter("E:/test/test1.txt", true)));//append any given input
pWriter.println(matcher.group()); //write the result of matcher to the new file
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
if (pWriter != null){
pWriter.flush();
pWriter.close();
}
}
System.out.println(matcher.group());
}
}
}
}
目前匹配器是按行应用的,它需要应用到整个文件才能按预期工作。
正则表达式是贪婪的,你将在第一次匹配时匹配整个字符串,除非你的字符串中有 .
(或其他特殊字符):
...
+ "comprise){1}"
+ "[\s\w\,\(\)\;\:]*\."); //Regex
在最后一行你匹配任何空格和单词,除了 .
。 {1}
和大部分 \
都是多余的(因为在 []
中):
...
+ "comprise)"
+ "[\s\w,();:]*\."); //Regex
如果您不关心换行符,只需先将其删除即可,它应该可以工作(如果您有 "com\nprise"
之类的内容并想匹配它,我认为没有办法解决它):
s = s.replaceAll("\n+", "");
将while ((line = r.readLine()) != null)
改为:
String file = ""; // Basically, a conglomerate of all of the lines in the file
while ((line = r.readLine()) != null) {
file += line; // Append each line to the "file" string
}
Matcher matcher = patt.matcher(file);
while (matcher.find()) {
/* Blah blah blah, your outputting goes here. */
}
之所以会出现这种情况,是因为您要单独处理每一行。对于你想要的,你需要将正则表达式一次性应用到文件。