使用 JQ - 加入 2 json 个结构不同但具有公共密钥的文件
Using JQ - Join 2 json files which do not have the same structure but with a common key
我有 2 个文件要根据公用密钥加入。
第一个文件是一个数组:
{"_time": "2022-02-20T23","csp_name": "1","tool_bf_id": "1234", "dvc_ssr": "aa-1111"}
{"_time": "2022-02-20T23","csp_name": "2","tool_bf_id": "4567", "dvc_ssr": "aa-2222"}
{"_time": "2022-02-20T23","csp_name": "3","tool_bf_id": "1357", "dvc_ssr": "null"}
{"_time": "2022-02-20T23","csp_name": "4","tool_bf_id": "2468", "dvc_ssr": "aa-1111"}
{"_time": "2022-02-20T23","csp_name": "5","tool_bf_id": "1246", "dvc_ssr": "null"}
第二个文件也是一个数组,完整列出所有不同的“dvc_ssr”与:
{"host": "hostId1","dvc_ssr": "aa-1111"}
{"host": "hostId2","dvc_ssr": "aa-2222"}
我正在尝试通过使用 dvc_ssr 键值将第二个文件中的信息添加到第一个文件中来加入这两个文件。
我期待这样的事情:
{"_time": "2022-02-20T23","csp_name": "1","tool_bf_id": "1234", "dvc_ssr": "aa-1111","host": "hostId1"}
{"_time": "2022-02-20T23","csp_name": "2",,"tool_bf_id": "4567", "dvc_ssr": "aa-2222","host": "hostId2"}
{"_time": "2022-02-20T23","csp_name": "3","tool_bf_id": "1357", "dvc_ssr": "null"}
{"_time": "2022-02-20T23","csp_name": "4","tool_bf_id": "2468", "dvc_ssr": "aa-1111","host": "hostId1"}
{"_time": "2022-02-20T23","csp_name": "5","tool_bf_id": "1246", "dvc_ssr": "null"}
经过一些研究,我找到了使用 flatten/group_by/add ..
的想法
flatten | group_by(.dvc_ssr) | map(reduce .[] as $x ({}; . * $x))
或
[.[1] + . [0] | group_by(.dvc_ssr) [] | add]
但这行不通。
问题是通过对它们进行分组,我将“null”和具有相同“dvc_ssr”的那些分组。最后我丢失了一些记录。
我能够使用 jq -s (--slurp) 加入文件并尝试对数组进行分组
jq -s '[.[0] + .[1] | group_by(.dvc_ssr) []]' file1.json file2.json
然后我可以从第二个文件中删除未使用的记录,方法是:
|map(select(if ._time!=null then . else empty end))| .[]
真正的想法是像 SQL 那样进行 JOIN,其中“dvc_ssr”是相同的。
使用 JOIN
的所有参数,你会做
jq --slurpfile fst first.json --slurpfile snd second.json -nc '
JOIN(INDEX($snd[]; .dvc_ssr); $fst[]; .dvc_ssr; add)
'
{"_time":"2022-02-20T23","csp_name":"1","tool_bf_id":"1234","dvc_ssr":"aa-1111","host":"hostId1"}
{"_time":"2022-02-20T23","csp_name":"2","tool_bf_id":"4567","dvc_ssr":"aa-2222","host":"hostId2"}
{"_time":"2022-02-20T23","csp_name":"3","tool_bf_id":"1357","dvc_ssr":"null"}
{"_time":"2022-02-20T23","csp_name":"4","tool_bf_id":"2468","dvc_ssr":"aa-1111","host":"hostId1"}
{"_time":"2022-02-20T23","csp_name":"5","tool_bf_id":"1246","dvc_ssr":"null"}
根据您的应用程序上下文,您可能会切断其中的一些。例如,只有两个参数的 JOIN 接受流作为输入,并允许您在 JOIN
.
之外处理连接
相同示例(注意 -s
而不是 -n
,以及 JOIN(…)[]
而不是 JOIN(…)
:
jq --slurpfile snd second.json -sc '
JOIN(INDEX($snd[]; .dvc_ssr); .dvc_ssr)[] | add
' first.json
我有 2 个文件要根据公用密钥加入。
第一个文件是一个数组:
{"_time": "2022-02-20T23","csp_name": "1","tool_bf_id": "1234", "dvc_ssr": "aa-1111"}
{"_time": "2022-02-20T23","csp_name": "2","tool_bf_id": "4567", "dvc_ssr": "aa-2222"}
{"_time": "2022-02-20T23","csp_name": "3","tool_bf_id": "1357", "dvc_ssr": "null"}
{"_time": "2022-02-20T23","csp_name": "4","tool_bf_id": "2468", "dvc_ssr": "aa-1111"}
{"_time": "2022-02-20T23","csp_name": "5","tool_bf_id": "1246", "dvc_ssr": "null"}
第二个文件也是一个数组,完整列出所有不同的“dvc_ssr”与:
{"host": "hostId1","dvc_ssr": "aa-1111"}
{"host": "hostId2","dvc_ssr": "aa-2222"}
我正在尝试通过使用 dvc_ssr 键值将第二个文件中的信息添加到第一个文件中来加入这两个文件。
我期待这样的事情:
{"_time": "2022-02-20T23","csp_name": "1","tool_bf_id": "1234", "dvc_ssr": "aa-1111","host": "hostId1"}
{"_time": "2022-02-20T23","csp_name": "2",,"tool_bf_id": "4567", "dvc_ssr": "aa-2222","host": "hostId2"}
{"_time": "2022-02-20T23","csp_name": "3","tool_bf_id": "1357", "dvc_ssr": "null"}
{"_time": "2022-02-20T23","csp_name": "4","tool_bf_id": "2468", "dvc_ssr": "aa-1111","host": "hostId1"}
{"_time": "2022-02-20T23","csp_name": "5","tool_bf_id": "1246", "dvc_ssr": "null"}
经过一些研究,我找到了使用 flatten/group_by/add ..
的想法flatten | group_by(.dvc_ssr) | map(reduce .[] as $x ({}; . * $x))
或
[.[1] + . [0] | group_by(.dvc_ssr) [] | add]
但这行不通。 问题是通过对它们进行分组,我将“null”和具有相同“dvc_ssr”的那些分组。最后我丢失了一些记录。
我能够使用 jq -s (--slurp) 加入文件并尝试对数组进行分组
jq -s '[.[0] + .[1] | group_by(.dvc_ssr) []]' file1.json file2.json
然后我可以从第二个文件中删除未使用的记录,方法是:
|map(select(if ._time!=null then . else empty end))| .[]
真正的想法是像 SQL 那样进行 JOIN,其中“dvc_ssr”是相同的。
使用 JOIN
的所有参数,你会做
jq --slurpfile fst first.json --slurpfile snd second.json -nc '
JOIN(INDEX($snd[]; .dvc_ssr); $fst[]; .dvc_ssr; add)
'
{"_time":"2022-02-20T23","csp_name":"1","tool_bf_id":"1234","dvc_ssr":"aa-1111","host":"hostId1"}
{"_time":"2022-02-20T23","csp_name":"2","tool_bf_id":"4567","dvc_ssr":"aa-2222","host":"hostId2"}
{"_time":"2022-02-20T23","csp_name":"3","tool_bf_id":"1357","dvc_ssr":"null"}
{"_time":"2022-02-20T23","csp_name":"4","tool_bf_id":"2468","dvc_ssr":"aa-1111","host":"hostId1"}
{"_time":"2022-02-20T23","csp_name":"5","tool_bf_id":"1246","dvc_ssr":"null"}
根据您的应用程序上下文,您可能会切断其中的一些。例如,只有两个参数的 JOIN 接受流作为输入,并允许您在 JOIN
.
相同示例(注意 -s
而不是 -n
,以及 JOIN(…)[]
而不是 JOIN(…)
:
jq --slurpfile snd second.json -sc '
JOIN(INDEX($snd[]; .dvc_ssr); .dvc_ssr)[] | add
' first.json