从多个括号中提取字符串

Question

我有一个包含以下内容的文件：

    ok: [10.9.22.122] => {
        "out.stdout_lines": [
            "cgit-1.1-11.el7.x86_64",
            "python-paramiko-2.1.1-0.9.el7.noarch",
            "varnish-libs-4.0.5-1.el7.x86_64",
            "kernel-3.10.0-862.el7.x86_64"
        ]
    }
    ok: [10.9.33.123] => {
        "out.stdout_lines": [
            "python-paramiko-2.1.1-0.9.el7.noarch"
        ]
    }

    ok: [10.9.44.124] => {
        "out.stdout_lines": [
            "python-paramiko-2.1.1-0.9.el7.noarch",
            "kernel-3.10.0-862.el7.x86_64"
        ]
    }

   ok: [10.9.33.29] => {
       "out.stdout_lines": []
   }
   ok: [10.9.22.28] => {
       "out.stdout_lines": [
        "NetworkManager-tui-1:1.12.0-8.el7_6.x86_64", 
        "java-1.8.0-openjdk-javadoc-zip-debug-1:1.8.0.171-8.b10.el7_5.noarch", 
        "java-1.8.0-openjdk-src-1:1.8.0.171-8.b10.el7_5.x86_64", 
        "kernel-3.10.0-862.el7.x86_64", 
        "kernel-tools-3.10.0-862.el7.x86_64", 
    ]
}

ok: [10.2.2.2] => {
    "out.stdout_lines": [
        "monitorix-3.10.1-1.el6.noarch", 
        "singularity-runtime-2.6.1-1.1.el6.x86_64"
    ]
}

ok: [10.9.22.33] => {
    "out.stdout_lines": [
        "NetworkManager-1:1.12.0-8.el7_6.x86_64",
        "gnupg2-2.0.22-5.el7_5.x86_64", 
        "kernel-3.10.0-862.el7.x86_64", 
    ]
}

如果 stout_line 包含 kernel*.

，我需要提取 [] 之间的 IP

我想 "emulate" 子字符串，将 'block' 内容保存到变量中并遍历所有文件。
如果我有很多定界符，我将如何使用 sed 或其他方式来执行此操作？

Answer 1

由于您的数据格式非常正确，您可以使用 awk(gawk):

awk '
    # get the ip address
    /ok:/ {ip = gensub(/[^0-9\.]/, "", "g", ) }

    # check the stdout_lines block and print Kernal and ip saved from the above line
    /"out.stdout_lines":/,/\]/ { if (/\<[Kk]ernel\>/) print ip}
' file
#10.9.22.122
#10.9.44.124
#10.9.22.28
#10.9.22.28
#10.9.22.33

注意：

我调整了正则表达式以反映您的更新数据。
您可能会在 out.stdout_lines 块下获得同一 IP 的多个内核文件，这将多次产生相同的 IP。如果发生这种情况，只需将结果通过管道传递给 | uniq

Answer 2

快速解决方案： #!/bin/bash

AWK='
    /^ok:/ { gsub(/^.*\[/,""); gsub(/].*$/,""); ip=[=10=] }
    /"Kernel-default/ { if (ip) print ip; ip="" }
'
awk "$AWK" INPUT

Answer 3

$ gawk -v RS="ok: " -F " => " ' ~ /[Kk]ernel/ { printf "The IP %s contains Kernel\n",  }' file
The IP [10.9.22.122] contains Kernel
The IP [10.9.44.124] contains Kernel

Answer 4

一个GNU awk解决方案：

awk -F'\]|\[' 'tolower()~/"out.stdout_lines" *:/ && tolower()~/"kernel/{print "The IP "  " cointain Kernel"}' RS='}' file

输出：

The IP 10.9.22.122 cointain Kernel
The IP 10.9.44.124 cointain Kernel
The IP 10.9.22.28 cointain Kernel
The IP 10.9.22.33 cointain Kernel

我使用 ] 或 [ 作为 FS 字段分隔符，} 作为 RS 记录分隔符。
所以 IP 将变为 </code>.<br> 此解决方案取决于结构，这意味着 <code>"out.stdout_lines" 需要在 [ip] 之后的字段中，就像您在示例中显示的那样。

另一种GNU awk方式，无以上限制：

awk -F']' 'match(tolower([=12=]),/"out\.stdout_lines": *\[([^\]]+)/,m){if(m[1]~/"kernel/)print "The IP " substr(, index(,"[")+1) " cointain Kernel"}' RS='}' file

相同的输出。 tolowers 用于不区分大小写的匹配，如果你想要完全匹配，你可以删除它们或者只使用 Revision 6 中的解决方案。

结合以上两种方式的优点，第三种方式：

awk -F'\]|\[' 'match(tolower([=13=]),/"out\.stdout_lines": *\[([^\]]+)/,m){if(m[1]~/"kernel/)print "The IP "  " cointain Kernel"}' RS='}' file

如果不需要不区分大小写的匹配，请将 tolower([=24=]) 更改为 [=25=]。

Answer 5

能否请您尝试以下操作，我相信这应该适用于大多数 awk。（我在条件匹配中添加了 [kK]，因此它应该查找 kernal 或Kernal 两个字符串（因为 OP 的前一个样本有大写 K 而现在它有 k 小的，所以想在这里涵盖两者）。

awk '
/ok/{
   gsub(/.*\[|\].*/,"")
   ip=[=10=]
}
/stdout_line/{
   found=1
   next
}
found && /[kK]ernel/{
   print ip
}
/}/{
   ip=found=""
}
'  Input_file

说明：为以上代码添加说明。

awk '                       ##Starting awk program here.
/ok/{                       ##Checking condition if a line contains string ok in it then do following.
   gsub(/.*\[|\].*/,"")     ##Globally substituting everything till [ and everything till ] with NULL in current line.
   ip=[=11=]                    ##Creating variable named ip whose values is current line value(edited one).
}                           ##Closing BLOCK for ok string check condition.
/stdout_line/{              ##Checking condition if a line contains stdout_line then do following.
   found=1                  ##Set value of variable named found to 1 here.
   next                     ##next will skip all further statements from here.
}                           ##Closing BLOCK for stdout_line string check condition here.
found && /[kK]ernel/{          ##Checking condition if variable found is NOT NULL and string Kernel found in current line then do following.
   print ip                 ##Printing value of variable ip here.
}                           ##Closing BLOCK for above condition now.
/}/{                        ##Checking condition if a line contains } then do following.
   ip=found=""              ##Nullify ip and found variable here.
}                           ##Closing BLOCK for } checking condition.
'   Input_file              ##Mentioning Input_file name here.

输出如下。

10.9.22.122
10.9.44.124
10.9.22.28
10.9.22.28
10.9.22.33

Answer 6

这可能适合您 (GNU sed)：

sed -n '/ok:/{s/[^0-9.]//g;:a;N;/]/!ba;/stdout_line.*kernel/P}' file

设置-n禁止隐式打印

如果一行包含字符串 ok: 这是一个 IP 地址，则删除该行除整数和句点之外的所有内容。

追加更多行，直到遇到包含 ] 的行，如果模式 space 包含 stdout_line 和 kernel，则打印第一行。

Answer 7

使用 Perl

$ perl -0777 -ne 's!\[(\S+)\].+?\{(.+?)\}!$y=;$x=;$x=~/kernel/ ? print "$y\n":""!sge'  brenn.log
10.9.22.122
10.9.44.124
10.9.22.28
10.9.22.33

$

从多个括号中提取字符串

Extract string from many brackets

bash

scripting

awk

cut

sed