bash + 如何从长输出中捕获单词

Question

我从以下命令得到以下输出

zookeeper-shell.sh 19.2.6.4  get /brokers/ids/1010

输出是

Connecting to 19.2.6.4

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
{"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT"},"endpoints":["PLAINTEXT://kafka1.lulu.com:6667"],"rack":"/default-rack","jmx_port":9997,"port":6667,"host":"kafka1.lulu.com","version":4,"timestamp":"1630507307906"}

主要目标是抓取机器名kafka1 – 从上面输出

所以我成功地执行了以下长命令语法

zookeeper-shell.sh 119.2.6.4  get /brokers/ids/1010 | sed s'/\/\// /g' | sed s'/:/ /g' | sed s'/,/ /g' | sed s'/"/ /g' | sed s'/\./ /g'| awk '{for (i=1;i<=NF;i++) print $i}' | grep -i kafka | sort | uniq

结果是：（如预期的结果）

kafka1

情况是我觉得我的方法很糟糕，它太长而且不太优雅

我们能否得到建议（使用 awk/sed/perl 一行），这些建议比我的语法要好得多？

Answer 1

使用您显示的示例，请尝试遵循 awk 代码。由于我没有 zookeeper 命令，我编写了这段代码并仅根据您显示的输出对其进行了测试。

zookeeper-shell.sh 19.2.6.4  get /brokers/ids/1010 | 
awk '
/WatchedEvent state/{
  found=1
  next
}
found && match([=10=],/"PLAINTEXT:\/\/[^:]*/){
  print substr([=10=],RSTART+13,RLENGTH-13)
}
'

说明： 为以上 awk 代码添加详细说明。

awk '                                         ##Starting awk program from here.
/WatchedEvent state/{                         ##Checking condition if line contains WatchedEvent state
  found=1                                     ##Then set found to 1 here.
  next                                        ##next will skip all further statements from here.
}
found && match([=11=],/"PLAINTEXT:\/\/[^:]*/){    ##Checking condition if found is SET then match regex "PLAINTEXT:\/\/[^:]* in match function of awk.
  print substr([=11=],RSTART+13,RLENGTH-13)       ##Printing sub string of matched regex used in match function above.
}
'

Answer 2

您要解析的文本是 JSON，因此请使用像 jq 这样的 JSON 感知工具来完成大部分工作，例如使用 cat file 因为我没有你用来生成输出的命令：

$ cat file | jq -Rr 'fromjson? | .endpoints[]'
PLAINTEXT://kafka1.lulu.com:6667

$ cat file | jq -Rr 'fromjson? | .endpoints[]' | awk -F'[/.]' '{print }'
kafka1

Answer 3

使用 perl，您可以：

$zookeeper_command | perl -MJSON::PP=decode_json -wnE'/^\{"/ or next; $j = decode_json($_); ($s) = (split /\./, $j->{host})[0]; say $s'

详细命令：

-MJSON::PP=decode_json => 从 JSON::PP 模块导入 decode_json（这是一个核心模块。
/^\{"/ or next; => 跳行看起来不像 json 字符串。
$j = decode_json($_); => 从 json 字符串存储到 $j 数据结构中。
($s) = (split /\./, $j->{host})[0]; => 拆分字符串 kafka1.lulu.com 并仅存储在 $s 的第一部分。

它也可以写成更短的形式（而且可读性也更差）：

$zookeeper_command | perl -MJSON::PP=decode_json -wnE'say decode_json($_)->{host}=~s/\..*$//r if/^\{"/'

Answer 4

您可以使用以下脚本筛选出感兴趣的数据，这样可以避免键入长命令行。

use strict;
use warnings;
use feature 'say';

use JSON;

my $data;

while( <> ) {
    next unless /^\{.*?\}$/;   # skip all but JSON string
    
    my $data = from_json($_);  # restore data structure
    my $host = (split('\.',$data->{host}))[0]; # extract info of interest
    
    say $host;                 # output it
}

运行作为 zookeeper-shell.sh 19.2.6.4 get /brokers/ids/1010 | script.pl.

注意：使脚本可执行 chmod +x script.pl 并将其存储在添加变量 $PATH.

的 $HOME/bin 目录中

bash + 如何从长输出中捕获单词

bash + how to capture word from a long output

regex

language-agnostic