使用 grep 从 JSON 中提取字符串

Question

我有一个 JSON 输入：

{
  "policyItems": [
    {
      "accesses": [
        {
          "type": "submit-app",
          "isAllowed": true
        }
      ],
      "users": [],
      "groups": [
        "Application_Team_1",
        "team2"
      ],
      "conditions": [],
      "delegateAdmin": false
    }
  ]
}

我做了一个命令行 curl 来显示队列策略纱线：

curl  -u "login:password" http://myHost:6080/service/public/v2/api/service/YARN_Cluster/policy/YARN%20NameQueue/

它工作正常。

然后我添加了 grep 以提取所有组项目列表：

curl  -u "login:password" http://myHost:6080/service/public/v2/api/service/YARN_Cluster/policy/YARN%20NameQueue/ | 
grep -oP '(?<="groups": ")[^"]*'

结果如下：

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   579    0   579    0     0   4384      0 --:--:-- --:--:-- --:--:--  4419

它不工作。我如何使用 grep 而不是 jq 来做到这一点？

Answer 1

您可以使用

grep -Poza '(?:\G(?!^)",|"groups":\s*\[)\s*"\K[^"]+'

选项

P - 使用PCRE引擎解析模式
o - 找到输出匹配项
z - 吞噬整个文件，将文件视为一个完整的字符串
a - 将文件视为文本文件（它 should be used 因为 -z 开关可能会触发 grep 二进制数据 行为这会更改 return 值）。

模式

(?:\G(?!^)",|"groups":\s*\[) - end of the previous match (\G(?!^)) 然后是 ", 子字符串，或者 (|) 文字文本 "groups": , 0+ 个空格 (\s*) 和一个 [ 字符 (\[)
\s*" - 0+ 个空格和 " 字符
\K - match reset operator 丢弃到目前为止匹配的整个文本
[^"]+ - "

如您所见，此表达式找到 "group": ["，忽略该文本并仅在该文本之后匹配 "s 内的每个值。

参见PCRE regex demo。

使用 grep 从 JSON 中提取字符串

Extract string from JSON using grep

regex

shell

grep

rjson