如何提取两个字符串之间的子字符串,包括 matching/enclosing 字符串?

How to extract substring between two strings, INCLUDING the matching/enclosing strings?

我有一个 XML 文件,我需要从 XML 文件中提取一个元素(在 Linux 机器上),但不幸的是,机器没有 " xsltproc”命令(我无法安装它),所以我想弄清楚如何使用其他可用工具(例如 sed 等)进行提取。

下面是 XML 的示例:

<l7:List xmlns:l7="http://ns.l7tech.com/2010/04/gateway-management">
    <l7:Name>REVOCATION_CHECK_POLICY List</l7:Name>
    <l7:Type>List</l7:Type>
    <l7:TimeStamp>2022-05-30T12:36:16.994Z</l7:TimeStamp>
    <l7:Link rel="self" uri="https://myhost02.xxxx.com:8443/restman/1.0/revocationCheckingPolicies"/>
    <l7:Link rel="template" uri="https://myhost02.xxxx.com:8443/restman/1.0/revocationCheckingPolicies/template"/>
    <l7:Item>
        <l7:Name>OCSPREVOCATIONVALIDATION</l7:Name>
        <l7:Id>a60c5a8714b2e519a6c23192cf09ded5</l7:Id>
        <l7:Type>REVOCATION_CHECK_POLICY</l7:Type>
        <l7:TimeStamp>2022-05-30T12:36:16.994Z</l7:TimeStamp>
        <l7:Link rel="self" uri="https://myhost02.xxxx.com:8443/restman/1.0/revocationCheckingPolicies/a60c5a8714b2e519a6c23192cf09ded5"/>
        <l7:Resource>
            <l7:RevocationCheckingPolicy id="a60c5a8714b2e519a6c23192cf09ded5" version="20">
                <l7:Name>OCSPREVOCATIONVALIDATION</l7:Name>
                <l7:DefaultPolicy>true</l7:DefaultPolicy>
                <l7:ContinueOnServerUnavailable>false</l7:ContinueOnServerUnavailable>
                <l7:DefaultSuccess>false</l7:DefaultSuccess>
                <l7:RevocationCheckItems>
                    <l7:Type>OCSP from URL</l7:Type>
                    <l7:Url>http://foo.west.dev.xxxx.com:80</l7:Url>
                    <l7:AllowIssuerSignature>true</l7:AllowIssuerSignature>
                    <l7:TrustedSigners>3c880ed5addceb2e9ef308074f2c353f</l7:TrustedSigners>
                </l7:RevocationCheckItems>
            </l7:RevocationCheckingPolicy>
        </l7:Resource>
    </l7:Item>
</l7:List>    

我需要将以下 XML 提取到单独的变量或文件中:

            <l7:RevocationCheckingPolicy id="a60c5a8714b2e519a6c23192cf09ded5" version="20">
                <l7:Name>OCSPREVOCATIONVALIDATION</l7:Name>
                <l7:DefaultPolicy>true</l7:DefaultPolicy>
                <l7:ContinueOnServerUnavailable>false</l7:ContinueOnServerUnavailable>
                <l7:DefaultSuccess>false</l7:DefaultSuccess>
                <l7:RevocationCheckItems>
                    <l7:Type>OCSP from URL</l7:Type>
                    <l7:Url>http://foo.west.dev.xxxx.com:80</l7:Url>
                    <l7:AllowIssuerSignature>true</l7:AllowIssuerSignature>
                    <l7:TrustedSigners>3c880ed5addceb2e9ef308074f2c353f</l7:TrustedSigners>
                </l7:RevocationCheckItems>
            </l7:RevocationCheckingPolicy>

我可以使用以下方法将文件“扁平化”为单个字符串:

sed ':a;N;$!ba;s/\n//g' response.xml

但是我必须尝试提取我需要的字符串(介于:

<l7:RevocationCheckingPolicy

和:

</l7:RevocationCheckingPolicy>

包含两个匹配字符串。

我可以在没有匹配字符串的情况下提取子字符串:

sed ':a;N;$!ba;s/\n//g' response.xml | sed -e 's/.*<l7\:RevocationCheckingPolicy\(.*\)<\/l7\:RevocationCheckingPolicy>.*//'

这给了我:

 id="a60c5a8714b2e519a6c23192cf09ded5" version="20">                            <l7:Name>OCSPREVOCATIONVALIDATION</l7:Name>                    <l7:DefaultPolicy>true</l7:DefaultPolicy>                    <l7:ContinueOnServerUnavailable>false</l7:ContinueOnServerUnavailable>                    <l7:DefaultSuccess>false</l7:DefaultSuccess>                    <l7:RevocationCheckItems>                        <l7:Type>OCSP from URL</l7:Type>                        <l7:Url>http://foo.west.dev.xxxx.com:80</l7:Url>                        <l7:AllowIssuerSignature>true</l7:AllowIssuerSignature>                        <l7:TrustedSigners>3c880ed5addceb2e9ef308074f2c353f</l7:TrustedSigners>                    </l7:RevocationCheckItems>

但是 XML 缺少封闭的字符串:

<l7:RevocationCheckingPolicy

开头和:

</l7:RevocationCheckingPolicy> 

我看到一些建议只在以下内容之前和之后包含封闭字符串:

.*

在第二个 sed 中,但是当我尝试这样做时,似乎这导致第二个 sed 根本不匹配。

谁能告诉我如何包含封闭字符串?

谢谢, 吉姆

使用sed

$ sed -n '/<l7:RevocationCheckingPolicy/,\|</l7:RevocationCheckingPolicy>|p' input_file > outfile
$ cat outfile
            <l7:RevocationCheckingPolicy id=a60c5a8714b2e519a6c23192cf09ded5 version=20>
                <l7:Name>OCSPREVOCATIONVALIDATION</l7:Name>
                <l7:DefaultPolicy>true</l7:DefaultPolicy>
                <l7:ContinueOnServerUnavailable>false</l7:ContinueOnServerUnavailable>
                <l7:DefaultSuccess>false</l7:DefaultSuccess>
                <l7:RevocationCheckItems>
                    <l7:Type>OCSP from URL</l7:Type>
                    <l7:Url>http://foo.west.dev.xxxx.com:80</l7:Url>
                    <l7:AllowIssuerSignature>true</l7:AllowIssuerSignature>
                    <l7:TrustedSigners>3c880ed5addceb2e9ef308074f2c353f</l7:TrustedSigners>
                </l7:RevocationCheckItems>
            </l7:RevocationCheckingPolicy>

物有所值,使用 xml 工具获取所需元素,如 xmllint
优点:无论命名空间前缀和XML文件格式如何都可以工作。

xmllint --xpath '//*[local-name()="RevocationCheckingPolicy"]' test.xml

结果

<l7:RevocationCheckingPolicy id="a60c5a8714b2e519a6c23192cf09ded5" version="20">
        <l7:Name>OCSPREVOCATIONVALIDATION</l7:Name>
        <l7:DefaultPolicy>true</l7:DefaultPolicy>
        <l7:ContinueOnServerUnavailable>false</l7:ContinueOnServerUnavailable>
        <l7:DefaultSuccess>false</l7:DefaultSuccess>
        <l7:RevocationCheckItems>
          <l7:Type>OCSP from URL</l7:Type>
          <l7:Url>http://foo.west.dev.xxxx.com:80</l7:Url>
          <l7:AllowIssuerSignature>true</l7:AllowIssuerSignature>
          <l7:TrustedSigners>3c880ed5addceb2e9ef308074f2c353f</l7:TrustedSigners>
        </l7:RevocationCheckItems>
      </l7:RevocationCheckingPolicy>

或者,可以使用 --shell 选项来完成

echo 'cat //*[local-name()="RevocationCheckingPolicy"]' | xmllint --shell test.xml

建议 awk 脚本:

awk '/<l7:RevocationCheckingPolicy /,/l7:RevocationCheckingPolicy>/' input.xml