两个关键字之间 text/pattern 的正则表达式帮助
Regex Help for text/pattern between two keywords
我正在尝试提取两个词之间的文本。
下面的模式在整个文本文档中重复 'start keyword' 和 'end keyword' 之间的修改。
该文档在以下模式前后有段落和文本,我不想提取它们。
任何人都可以帮助我使用以下正则表达式吗?这将提取所有事件。
开始关键字- RIASWIX
结束关键字 - Sky Access
----Document Start-------
Paragraph*
RIASWIX.* ABCDEF1 NONE
WORKING: HELLO(READ)
BOOLEAN Access: SADGRE3, VJFKES3, JGJKEWW, IS4DWF44(A), DFEAWE2(G),
DW4444W, IHFK3MF3
BAZAAR Access: No resource with BAZAAR Access
GHAR Access: No resource with GHAR Access
WATER Access: ADMINDDD(A), GEDDE33
SKY None: No Resource with Sky Access
RIASWIX.@7483NFJ.* HFDFDF3 NONE
WORKING: BYE(READ)
BOOLEAN Access: GRREGGG, GREFEFF, GFGGGG, FDFDFDF(A), RERERE3(G),
GFFWEF44, FFRF44F
BAZAAR Access: No resource with BAZAAR Access
GHAR Access: No resource with GHAR Access
WATER Access: ADMINEWW(A), FFRFRGR
SKY None: No Resource with Sky Access
RIASWIX.@7483KXX.* HFDFDF3 NONE
WORKING: TATA(READ)
BOOLEAN Access: GRDSD33, FASDE, GFGGGG, RWERW33(A), NMUYHT4(G),
BAZAAR Access: XCDFEFE3, FREFE33R
GHAR Access: No resource with GHAR Access
WATER Access: DASDEFG(A), SJMFEIOE(P)
SKY None: No Resource with Sky Access
*Text
----Document End-------
(?s)
对于换行符,勾选这个 regex-match-all-characters-between-two-strings
import re
print(re.findall('RIASWIX(?s)(.*?)Sky Access', str1))
您在问题中添加了 Python
和 Java
作为标签。关于 Java.
我可以回答你
正则表达式实现:
如果需要排除每个匹配出现的开头和结尾的关键字,则需要使用positive lookbehind和positive lookahead来匹配和排除RIASWIX
和Sky Access
.
那么,你应该使用勉强的量词来匹配一对关键字之间的文本,否则你会匹配第一个和最后一个关键字之间的整个文本。
最后,您的正则表达式应启用 DOTALL
标志以匹配多行文本。
排除关键字的实施
https://regex101.com/r/6Lnm5i/1
String text = "... your text to parse ....";
//Creating a regex with the DOTALL mode enabled. Eventually you could add the flag within your regex by adding at the beginning (?s)
Pattern regex = Pattern.compile("(?<=RIASWIX).*?(?=Sky Access)", Pattern.DOTALL);
//Creating a matcher built on your regex and the text to parse
Matcher matcher = regex.matcher(text);
//While there are still occurrences
while(matcher.find()){
//Printing the occurrence
System.out.println(matcher.group());
}
包含关键字的实施
https://regex101.com/r/6RYTYf/1
String text = "... your text to parse ....";
//Creating a regex with the DOTALL mode enabled. Eventually you could add the flag within your regex by adding at the beginning (?s)
Pattern regex = Pattern.compile("RIASWIX.*?Sky Access", Pattern.DOTALL);
//Creating a matcher built on your regex and the text to parse
Matcher matcher = regex.matcher(text);
//While there are still occurrences
while(matcher.find()){
//Printing the occurrence
System.out.println(matcher.group());
}
替代正则表达式:
"^RIASWIX.*?\bSky Access\b"
上下文和测试平台中的正则表达式:
public static void main(String[] args) {
String input = getInput();
Matcher matcher = Pattern
.compile("^RIASWIX.*?\bSky Access\b", Pattern.MULTILINE | Pattern.DOTALL)
.matcher(input);
while(matcher.find()) {
System.out.println("=== === === START === ==== ===");
System.out.println(matcher.group());
System.out.println("=== === === END === ==== ===\n");
}
}
来自文档的输入:
private static String getInput() {
return "----Document Start-------\n" +
"\n" +
"Paragraph*\n" +
"\n" +
"RIASWIX.* ABCDEF1 NONE\n" +
" WORKING: HELLO(READ)\n" +
" BOOLEAN Access: SADGRE3, VJFKES3, JGJKEWW, IS4DWF44(A), DFEAWE2(G),\n" +
" DW4444W, IHFK3MF3\n" +
" BAZAAR Access: No resource with BAZAAR Access\n" +
" GHAR Access: No resource with GHAR Access\n" +
" WATER Access: ADMINDDD(A), GEDDE33\n" +
" SKY None: No Resource with Sky Access\n" +
"\n" +
"RIASWIX.@7483NFJ.* HFDFDF3 NONE\n" +
" WORKING: BYE(READ)\n" +
" BOOLEAN Access: GRREGGG, GREFEFF, GFGGGG, FDFDFDF(A), RERERE3(G),\n" +
" GFFWEF44, FFRF44F\n" +
" BAZAAR Access: No resource with BAZAAR Access\n" +
" GHAR Access: No resource with GHAR Access\n" +
" WATER Access: ADMINEWW(A), FFRFRGR\n" +
" SKY None: No Resource with Sky Access\n" +
"\n" +
"RIASWIX.@7483KXX.* HFDFDF3 NONE\n" +
" WORKING: TATA(READ)\n" +
" BOOLEAN Access: GRDSD33, FASDE, GFGGGG, RWERW33(A), NMUYHT4(G),\n" +
" BAZAAR Access: XCDFEFE3, FREFE33R\n" +
" GHAR Access: No resource with GHAR Access\n" +
" WATER Access: DASDEFG(A), SJMFEIOE(P)\n" +
" SKY None: No Resource with Sky Access\n" +
"\n" +
"*Text\n" +
"\n" +
"----Document End-------";
}
输出:
=== === === START === ==== ===
RIASWIX.* ABCDEF1 NONE
WORKING: HELLO(READ)
BOOLEAN Access: SADGRE3, VJFKES3, JGJKEWW, IS4DWF44(A), DFEAWE2(G),
DW4444W, IHFK3MF3
BAZAAR Access: No resource with BAZAAR Access
GHAR Access: No resource with GHAR Access
WATER Access: ADMINDDD(A), GEDDE33
SKY None: No Resource with Sky Access
=== === === END === ==== ===
=== === === START === ==== ===
RIASWIX.@7483NFJ.* HFDFDF3 NONE
WORKING: BYE(READ)
BOOLEAN Access: GRREGGG, GREFEFF, GFGGGG, FDFDFDF(A), RERERE3(G),
GFFWEF44, FFRF44F
BAZAAR Access: No resource with BAZAAR Access
GHAR Access: No resource with GHAR Access
WATER Access: ADMINEWW(A), FFRFRGR
SKY None: No Resource with Sky Access
=== === === END === ==== ===
=== === === START === ==== ===
RIASWIX.@7483KXX.* HFDFDF3 NONE
WORKING: TATA(READ)
BOOLEAN Access: GRDSD33, FASDE, GFGGGG, RWERW33(A), NMUYHT4(G),
BAZAAR Access: XCDFEFE3, FREFE33R
GHAR Access: No resource with GHAR Access
WATER Access: DASDEFG(A), SJMFEIOE(P)
SKY None: No Resource with Sky Access
=== === === END === ==== ===
我正在尝试提取两个词之间的文本。 下面的模式在整个文本文档中重复 'start keyword' 和 'end keyword' 之间的修改。 该文档在以下模式前后有段落和文本,我不想提取它们。 任何人都可以帮助我使用以下正则表达式吗?这将提取所有事件。
开始关键字- RIASWIX 结束关键字 - Sky Access
----Document Start-------
Paragraph*
RIASWIX.* ABCDEF1 NONE
WORKING: HELLO(READ)
BOOLEAN Access: SADGRE3, VJFKES3, JGJKEWW, IS4DWF44(A), DFEAWE2(G),
DW4444W, IHFK3MF3
BAZAAR Access: No resource with BAZAAR Access
GHAR Access: No resource with GHAR Access
WATER Access: ADMINDDD(A), GEDDE33
SKY None: No Resource with Sky Access
RIASWIX.@7483NFJ.* HFDFDF3 NONE
WORKING: BYE(READ)
BOOLEAN Access: GRREGGG, GREFEFF, GFGGGG, FDFDFDF(A), RERERE3(G),
GFFWEF44, FFRF44F
BAZAAR Access: No resource with BAZAAR Access
GHAR Access: No resource with GHAR Access
WATER Access: ADMINEWW(A), FFRFRGR
SKY None: No Resource with Sky Access
RIASWIX.@7483KXX.* HFDFDF3 NONE
WORKING: TATA(READ)
BOOLEAN Access: GRDSD33, FASDE, GFGGGG, RWERW33(A), NMUYHT4(G),
BAZAAR Access: XCDFEFE3, FREFE33R
GHAR Access: No resource with GHAR Access
WATER Access: DASDEFG(A), SJMFEIOE(P)
SKY None: No Resource with Sky Access
*Text
----Document End-------
(?s)
对于换行符,勾选这个 regex-match-all-characters-between-two-strings
import re
print(re.findall('RIASWIX(?s)(.*?)Sky Access', str1))
您在问题中添加了 Python
和 Java
作为标签。关于 Java.
正则表达式实现:
如果需要排除每个匹配出现的开头和结尾的关键字,则需要使用positive lookbehind和positive lookahead来匹配和排除
RIASWIX
和Sky Access
.那么,你应该使用勉强的量词来匹配一对关键字之间的文本,否则你会匹配第一个和最后一个关键字之间的整个文本。
最后,您的正则表达式应启用
DOTALL
标志以匹配多行文本。
排除关键字的实施
https://regex101.com/r/6Lnm5i/1
String text = "... your text to parse ....";
//Creating a regex with the DOTALL mode enabled. Eventually you could add the flag within your regex by adding at the beginning (?s)
Pattern regex = Pattern.compile("(?<=RIASWIX).*?(?=Sky Access)", Pattern.DOTALL);
//Creating a matcher built on your regex and the text to parse
Matcher matcher = regex.matcher(text);
//While there are still occurrences
while(matcher.find()){
//Printing the occurrence
System.out.println(matcher.group());
}
包含关键字的实施
https://regex101.com/r/6RYTYf/1
String text = "... your text to parse ....";
//Creating a regex with the DOTALL mode enabled. Eventually you could add the flag within your regex by adding at the beginning (?s)
Pattern regex = Pattern.compile("RIASWIX.*?Sky Access", Pattern.DOTALL);
//Creating a matcher built on your regex and the text to parse
Matcher matcher = regex.matcher(text);
//While there are still occurrences
while(matcher.find()){
//Printing the occurrence
System.out.println(matcher.group());
}
替代正则表达式:
"^RIASWIX.*?\bSky Access\b"
上下文和测试平台中的正则表达式:
public static void main(String[] args) {
String input = getInput();
Matcher matcher = Pattern
.compile("^RIASWIX.*?\bSky Access\b", Pattern.MULTILINE | Pattern.DOTALL)
.matcher(input);
while(matcher.find()) {
System.out.println("=== === === START === ==== ===");
System.out.println(matcher.group());
System.out.println("=== === === END === ==== ===\n");
}
}
来自文档的输入:
private static String getInput() {
return "----Document Start-------\n" +
"\n" +
"Paragraph*\n" +
"\n" +
"RIASWIX.* ABCDEF1 NONE\n" +
" WORKING: HELLO(READ)\n" +
" BOOLEAN Access: SADGRE3, VJFKES3, JGJKEWW, IS4DWF44(A), DFEAWE2(G),\n" +
" DW4444W, IHFK3MF3\n" +
" BAZAAR Access: No resource with BAZAAR Access\n" +
" GHAR Access: No resource with GHAR Access\n" +
" WATER Access: ADMINDDD(A), GEDDE33\n" +
" SKY None: No Resource with Sky Access\n" +
"\n" +
"RIASWIX.@7483NFJ.* HFDFDF3 NONE\n" +
" WORKING: BYE(READ)\n" +
" BOOLEAN Access: GRREGGG, GREFEFF, GFGGGG, FDFDFDF(A), RERERE3(G),\n" +
" GFFWEF44, FFRF44F\n" +
" BAZAAR Access: No resource with BAZAAR Access\n" +
" GHAR Access: No resource with GHAR Access\n" +
" WATER Access: ADMINEWW(A), FFRFRGR\n" +
" SKY None: No Resource with Sky Access\n" +
"\n" +
"RIASWIX.@7483KXX.* HFDFDF3 NONE\n" +
" WORKING: TATA(READ)\n" +
" BOOLEAN Access: GRDSD33, FASDE, GFGGGG, RWERW33(A), NMUYHT4(G),\n" +
" BAZAAR Access: XCDFEFE3, FREFE33R\n" +
" GHAR Access: No resource with GHAR Access\n" +
" WATER Access: DASDEFG(A), SJMFEIOE(P)\n" +
" SKY None: No Resource with Sky Access\n" +
"\n" +
"*Text\n" +
"\n" +
"----Document End-------";
}
输出:
=== === === START === ==== ===
RIASWIX.* ABCDEF1 NONE
WORKING: HELLO(READ)
BOOLEAN Access: SADGRE3, VJFKES3, JGJKEWW, IS4DWF44(A), DFEAWE2(G),
DW4444W, IHFK3MF3
BAZAAR Access: No resource with BAZAAR Access
GHAR Access: No resource with GHAR Access
WATER Access: ADMINDDD(A), GEDDE33
SKY None: No Resource with Sky Access
=== === === END === ==== ===
=== === === START === ==== ===
RIASWIX.@7483NFJ.* HFDFDF3 NONE
WORKING: BYE(READ)
BOOLEAN Access: GRREGGG, GREFEFF, GFGGGG, FDFDFDF(A), RERERE3(G),
GFFWEF44, FFRF44F
BAZAAR Access: No resource with BAZAAR Access
GHAR Access: No resource with GHAR Access
WATER Access: ADMINEWW(A), FFRFRGR
SKY None: No Resource with Sky Access
=== === === END === ==== ===
=== === === START === ==== ===
RIASWIX.@7483KXX.* HFDFDF3 NONE
WORKING: TATA(READ)
BOOLEAN Access: GRDSD33, FASDE, GFGGGG, RWERW33(A), NMUYHT4(G),
BAZAAR Access: XCDFEFE3, FREFE33R
GHAR Access: No resource with GHAR Access
WATER Access: DASDEFG(A), SJMFEIOE(P)
SKY None: No Resource with Sky Access
=== === === END === ==== ===