用于提取多个不同分隔符行的 awk 脚本

Awk script to extract the Multiple different delimiter lines

日志文件:

25 Apr 2022 02:55:08,062  ; 12345678908,123456789, added soc:[DSPSIA2D9,450, USGPRSPPF,0] deleted soc:[] ldap soc:[DSPSIA2D9,450, OPTSRA1H7,52, USGPRSPPF,0] db SOC:[OPTSRA1H7,52]

25 Apr 2022 02:55:08,872  ; 98765432101,234567833, added soc:[DSPSIA2EB,450, USGPRSPPF,0] deleted soc:[DSPSIA2CZ,450] ldap soc:[BBSUSPEND,0, DSPSIA2EB,450, OPTSRA1H7,52, USGPRSPPF,0, BBSUSPEND,0, USGPRSPPF,0] db SOC:[BBSUSPEND,0, BBSUSPEND,0, DSPSIA2CZ,450, OPTSRA1H7,52, USGPRSPPF,0]

25 Apr 2022 02:55:09,413  ; 23456789022,123456789, added soc:[DSPSIA2D6,450] deleted soc:[DSPSIA0R6,450] ldap soc:[BBSUSPEND,0, DSPSIA2D6,450, OPTSRA1H7,52, USGPRSPPF,0, BBSUSPEND,0] db SOC:[BBSUSPEND,0, BBSUSPEND,0, DSPSIA0R6,450, OPTSRA1H7,52, USGPRSPPF,0]

如果“添加的soc”包含“USGPRSPPF”并提取第六列值。

输出:

12345678908
98765432101

我会按照以下方式使用 GNU AWK 完成此任务,令 file.txt 内容为

25 Apr 2022 02:55:08,062 ; 12345678908,123456789, added soc:[DSPSIA2D9,450, USGPRSPPF,0] deleted soc:[] ldap soc:[DSPSIA2D9,450, OPTSRA1H7,52, USGPRSPPF,0] db SOC:[OPTSRA1H7,52]

25 Apr 2022 02:55:08,872 ; 98765432101,234567833, added soc:[DSPSIA2EB,450, USGPRSPPF,0] deleted soc:[DSPSIA2CZ,450] ldap soc:[BBSUSPEND,0, DSPSIA2EB,450, OPTSRA1H7,52, USGPRSPPF,0, BBSUSPEND,0, USGPRSPPF,0] db SOC:[BBSUSPEND,0, BBSUSPEND,0, DSPSIA2CZ,450, OPTSRA1H7,52, USGPRSPPF,0]

25 Apr 2022 02:55:09,413 ; 23456789022,123456789, added soc:[DSPSIA2D6,450] deleted soc:[DSPSIA0R6,450] ldap soc:[BBSUSPEND,0, DSPSIA2D6,450, OPTSRA1H7,52, USGPRSPPF,0, BBSUSPEND,0] db SOC:[BBSUSPEND,0, BBSUSPEND,0, DSPSIA0R6,450, OPTSRA1H7,52, USGPRSPPF,0]

然后

awk 'BEGIN{FS="[[:space:]]+|,"}/added soc:\[[^\]]*USGPRSPPF/{print }' file.txt

给出输出

12345678908
98765432101

说明:我通知 GNU AWK 字段分隔符 (FS) 是一个或多个 (+) 空白字符或 (|) ,.然后对于包含 added soc:[ 后跟零个或多个 not-] 后跟 USGPRSPPF I print 第 7 个字段的每一行。请注意,文字 [] 需要转义,因为它们在正则表达式中具有特殊含义。

(在 gawk 4.2.1 中测试)