两个关键字之间 text/pattern 的正则表达式帮助

Regex Help for text/pattern between two keywords

我正在尝试提取两个词之间的文本。 下面的模式在整个文本文档中重复 'start keyword' 和 'end keyword' 之间的修改。 该文档在以下模式前后有段落和文本,我不想提取它们。 任何人都可以帮助我使用以下正则表达式吗?这将提取所有事件。

开始关键字- RIASWIX 结束关键字 - Sky Access

----Document Start-------

Paragraph*

RIASWIX.*                                 ABCDEF1   NONE
   WORKING:  HELLO(READ)
   BOOLEAN Access:  SADGRE3, VJFKES3, JGJKEWW, IS4DWF44(A), DFEAWE2(G),
     DW4444W, IHFK3MF3
   BAZAAR Access:  No resource with BAZAAR Access
   GHAR Access:  No resource with GHAR Access
   WATER Access:  ADMINDDD(A), GEDDE33
   SKY None:  No Resource with Sky Access

RIASWIX.@7483NFJ.*                                 HFDFDF3   NONE
   WORKING:  BYE(READ)
   BOOLEAN Access:  GRREGGG, GREFEFF, GFGGGG, FDFDFDF(A), RERERE3(G),
     GFFWEF44, FFRF44F
   BAZAAR Access:  No resource with BAZAAR Access
   GHAR Access:  No resource with GHAR Access
   WATER Access:  ADMINEWW(A), FFRFRGR
   SKY None:  No Resource with Sky Access

RIASWIX.@7483KXX.*                                 HFDFDF3   NONE
   WORKING:  TATA(READ)
   BOOLEAN Access:  GRDSD33, FASDE, GFGGGG, RWERW33(A), NMUYHT4(G),
   BAZAAR Access:  XCDFEFE3, FREFE33R
   GHAR Access:  No resource with GHAR Access
   WATER Access:  DASDEFG(A), SJMFEIOE(P)
   SKY None:  No Resource with Sky Access

*Text

----Document End-------

(?s) 对于换行符,勾选这个 regex-match-all-characters-between-two-strings

import re

print(re.findall('RIASWIX(?s)(.*?)Sky Access', str1))

您在问题中添加了 PythonJava 作为标签。关于 Java.

我可以回答你

正则表达式实现:

  • 如果需要排除每个匹配出现的开头和结尾的关键字,则需要使用positive lookbehind和positive lookahead来匹配和排除RIASWIXSky Access.

  • 那么,你应该使用勉强的量词来匹配一对关键字之间的文本,否则你会匹配第一个和最后一个关键字之间的整个文本。

  • 最后,您的正则表达式应启用 DOTALL 标志以匹配多行文本。

排除关键字的实施

https://regex101.com/r/6Lnm5i/1

String text = "... your text to parse ....";

//Creating a regex with the DOTALL mode enabled. Eventually you could add the flag within your regex by adding at the beginning (?s)
Pattern regex = Pattern.compile("(?<=RIASWIX).*?(?=Sky Access)", Pattern.DOTALL);

//Creating a matcher built on your regex and the text to parse
Matcher matcher = regex.matcher(text);

//While there are still occurrences
while(matcher.find()){
    //Printing the occurrence
    System.out.println(matcher.group());
}

包含关键字的实施

https://regex101.com/r/6RYTYf/1

String text = "... your text to parse ....";

//Creating a regex with the DOTALL mode enabled. Eventually you could add the flag within your regex by adding at the beginning (?s)
Pattern regex = Pattern.compile("RIASWIX.*?Sky Access", Pattern.DOTALL);

//Creating a matcher built on your regex and the text to parse
Matcher matcher = regex.matcher(text);

//While there are still occurrences
while(matcher.find()){
    //Printing the occurrence
    System.out.println(matcher.group());
}

替代正则表达式:

"^RIASWIX.*?\bSky Access\b"

上下文和测试平台中的正则表达式:

public static void main(String[] args) {
    String input = getInput();

    Matcher matcher = Pattern
            .compile("^RIASWIX.*?\bSky Access\b", Pattern.MULTILINE | Pattern.DOTALL)
            .matcher(input);

    while(matcher.find()) {
        System.out.println("=== === === START === ==== ===");
        System.out.println(matcher.group());
        System.out.println("=== === === END === ==== ===\n");
    }
}

来自文档的输入:

private static String getInput() {
    return "----Document Start-------\n" +
            "\n" +
            "Paragraph*\n" +
            "\n" +
            "RIASWIX.*                                 ABCDEF1   NONE\n" +
            "   WORKING:  HELLO(READ)\n" +
            "   BOOLEAN Access:  SADGRE3, VJFKES3, JGJKEWW, IS4DWF44(A), DFEAWE2(G),\n" +
            "     DW4444W, IHFK3MF3\n" +
            "   BAZAAR Access:  No resource with BAZAAR Access\n" +
            "   GHAR Access:  No resource with GHAR Access\n" +
            "   WATER Access:  ADMINDDD(A), GEDDE33\n" +
            "   SKY None:  No Resource with Sky Access\n" +
            "\n" +
            "RIASWIX.@7483NFJ.*                                 HFDFDF3   NONE\n" +
            "   WORKING:  BYE(READ)\n" +
            "   BOOLEAN Access:  GRREGGG, GREFEFF, GFGGGG, FDFDFDF(A), RERERE3(G),\n" +
            "     GFFWEF44, FFRF44F\n" +
            "   BAZAAR Access:  No resource with BAZAAR Access\n" +
            "   GHAR Access:  No resource with GHAR Access\n" +
            "   WATER Access:  ADMINEWW(A), FFRFRGR\n" +
            "   SKY None:  No Resource with Sky Access\n" +
            "\n" +
            "RIASWIX.@7483KXX.*                                 HFDFDF3   NONE\n" +
            "   WORKING:  TATA(READ)\n" +
            "   BOOLEAN Access:  GRDSD33, FASDE, GFGGGG, RWERW33(A), NMUYHT4(G),\n" +
            "   BAZAAR Access:  XCDFEFE3, FREFE33R\n" +
            "   GHAR Access:  No resource with GHAR Access\n" +
            "   WATER Access:  DASDEFG(A), SJMFEIOE(P)\n" +
            "   SKY None:  No Resource with Sky Access\n" +
            "\n" +
            "*Text\n" +
            "\n" +
            "----Document End-------";
}

输出:

=== === === START === ==== ===
RIASWIX.*                                 ABCDEF1   NONE
   WORKING:  HELLO(READ)
   BOOLEAN Access:  SADGRE3, VJFKES3, JGJKEWW, IS4DWF44(A), DFEAWE2(G),
     DW4444W, IHFK3MF3
   BAZAAR Access:  No resource with BAZAAR Access
   GHAR Access:  No resource with GHAR Access
   WATER Access:  ADMINDDD(A), GEDDE33
   SKY None:  No Resource with Sky Access
=== === === END === ==== ===

=== === === START === ==== ===
RIASWIX.@7483NFJ.*                                 HFDFDF3   NONE
   WORKING:  BYE(READ)
   BOOLEAN Access:  GRREGGG, GREFEFF, GFGGGG, FDFDFDF(A), RERERE3(G),
     GFFWEF44, FFRF44F
   BAZAAR Access:  No resource with BAZAAR Access
   GHAR Access:  No resource with GHAR Access
   WATER Access:  ADMINEWW(A), FFRFRGR
   SKY None:  No Resource with Sky Access
=== === === END === ==== ===

=== === === START === ==== ===
RIASWIX.@7483KXX.*                                 HFDFDF3   NONE
   WORKING:  TATA(READ)
   BOOLEAN Access:  GRDSD33, FASDE, GFGGGG, RWERW33(A), NMUYHT4(G),
   BAZAAR Access:  XCDFEFE3, FREFE33R
   GHAR Access:  No resource with GHAR Access
   WATER Access:  DASDEFG(A), SJMFEIOE(P)
   SKY None:  No Resource with Sky Access
=== === === END === ==== ===