jq:如何根据对象中的数据将对象从数组传输到不同的文件?
jq: How can I pipe objects from array to different files based on data in object?
我在主 JSON 文件中存储了大量对象。我想遍历该数组,获取每个对象,并根据对象中的字段(在本例中为州名称)将其附加到新文件。换句话说,在一组包含许多状态的数据中,我想将其过滤出每个状态的文件。
我正在使用现有的 JQ 表达式来仅过滤我实际需要的数据:
{ fipscode: .fipscode, level: .level, polid: .polid, polnum: .polnum, precinctsreporting: .precinctsreporting, precinctsreportingpct: .precinctsreportingpct, precinctstotal: .precinctstotal, raceid: .raceid, runoff: .runoff, statepostal: .statepostal, votecount: .votecount, votepct: .votepct, winner: .winner }
这是我输入的示例:
[
{ "ballotorder": 2, "candidateid": "9718", "delegatecount": 0, "description": null, "electiondate": "2018-08-28", "electtotal": 0, "electwon": 0, "fipscode": null, "first": "Doug", "id": "3015-polid-64364-state-AZ-1", "incumbent": true, "initialization_data": false, "is_ballot_measure": false, "last": "Ducey", "lastupdated": "2018-08-30T00:01:38.897Z", "level": "state", "national": true, "officeid": "G", "officename": "Governor", "party": "GOP", "polid": "64364", "polnum": "5554", "precinctsreporting": 1488, "precinctsreportingpct": 0.9993000000000001, "precinctstotal": 1489, "raceid": "3015", "racetype": "Primary", "racetypeid": "R", "reportingunitid": "state-AZ-1", "reportingunitname": null, "runoff": false, "seatname": null, "seatnum": null, "statename": "Arizona", "statepostal": "AZ", "test": false, "uncontested": false, "votecount": 355455, "votepct": 0.705493, "winner": true },
{ "ballotorder": 2, "candidateid": "21689", "delegatecount": 0, "description": null, "electiondate": "2018-08-28", "electtotal": 0, "electwon": 0, "fipscode": null, "first": "Ron", "id": "10046-polid-62557-state-FL-1", "incumbent": false, "initialization_data": false, "is_ballot_measure": false, "last": "DeSantis", "lastupdated": "2018-08-29T19:29:50.367Z", "level": "state", "national": true, "officeid": "G", "officename": "Governor", "party": "GOP", "polid": "62557", "polnum": "13918", "precinctsreporting": 5968, "precinctsreportingpct": 1.0, "precinctstotal": 5968, "raceid": "10046", "racetype": "Primary", "racetypeid": "R", "reportingunitid": "state-FL-1", "reportingunitname": null, "runoff": false, "seatname": null, "seatnum": null, "statename": "Florida", "statepostal": "FL", "test": false, "uncontested": false, "votecount": 913997, "votepct": 0.564728, "winner": true },
{ "ballotorder": 2, "candidateid": "45555", "delegatecount": 0, "description": null, "electiondate": "2018-08-28", "electtotal": 0, "electwon": 0, "fipscode": null, "first": "Rex", "id": "38538-polid-67011-state-OK-1", "incumbent": false, "initialization_data": false, "is_ballot_measure": false, "last": "Lawhorn", "lastupdated": "2018-08-29T02:44:44.610Z", "level": "state", "national": true, "officeid": "G", "officename": "Governor", "party": "Lib", "polid": "67011", "polnum": "40784", "precinctsreporting": 1951, "precinctsreportingpct": 1.0, "precinctstotal": 1951, "raceid": "38538", "racetype": "Runoff", "racetypeid": "L", "reportingunitid": "state-OK-1", "reportingunitname": null, "runoff": false, "seatname": null, "seatnum": null, "statename": "Oklahoma", "statepostal": "OK", "test": false, "uncontested": false, "votecount": 379, "votepct": 0.409287, "winner": false }
]
作为输出,我希望有一个 Arizona.json
仅包含来自该州的项目,并且还过滤以删除不需要的字段:
[
{ "fipscode": null, "level": "state", "polid": "64364", "polnum": "5554", "precinctsreporting": 1488, "precinctsreportingpct": 0.9993000000000001, "precinctstotal": 1489, "raceid": "3015", "runoff": false, "statepostal": "AZ", "votecount": 355455, "votepct": 0.705493, "winner": true }
]
...其他相关州也是如此(Florida.json
和 Oklahoma.json
)。
这是我目前拥有的 bash 和 jq 脚本:
cat master.json |
jq -cn --stream 'fromstream(1|truncate_stream(inputs))' |
jq -c '.statename as $state | {
fipscode: .fipscode,
level: .level,
polid: .polid,
polnum: .polnum,
precinctsreporting: .precinctsreporting,
precinctsreportingpct: .precinctsreportingpct,
precinctstotal: .precinctstotal,
raceid: .raceid,
runoff: .runoff,
statepostal: .statepostal,
votecount: .votecount,
votepct: .votepct,
winner: .winner
}'
我想不通的是如何截取每一行,以便确定输出的位置。这可能吗?
您可以通过 jq
的一个副本从输入文件中分离出数据项,然后 每个州 的另一个实例将这些数据项整理在一起,使用bash 提供胶水。请参阅以下示例,适用于 bash 4.2 或更新版本(可能适用于 4.1,我需要检查一下)。
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*|4.[01].*) echo "ERROR: Bash 4.2 required" >&2; exit 1;; esac
input_file=
[[ -s $input_file ]] || { echo "Usage: ${0##*/} input-file" >&2; exit 1; }
jq_split_script='
# modify this function to fit your needs
def relevantContentOnly:
{ fipscode, level, polid, polnum, precinctsreporting, precinctsreportingpct, precinctstotal, raceid, runoff, statepostal, votecount, votepct, winner };
.[] | [.statename, (relevantContentOnly | tojson)] | @tsv
'
# Use an associative array to map from state names to output FDs
declare -A out_fds=( )
# Read state / line-of-data pairs from our JQ script...
while IFS=$'\t' read -r state data; do
# If we don't already have a writer for the current state, start one.
if [[ ! ${out_fds[$state]} ]]; then
exec {new_fd}> >(jq -n '[inputs]' >"$state.json")
out_fds[$state]=$new_fd
fi
# Regardless, send the data to the FD we have for this state
printf '%s\n' "$data" >&${out_fds[$state]}
done < <(jq -rc "$jq_split_script" <"$input_file") # ...running the JQ script above.
# close output FDs, so the JQ instances all flush
for fd in "${!out_fds[@]}"; do
exec {fd}>&-
done
这是一个简单的解决方案,搭载了您开始的内容:
< master.json jq -cn --stream 'fromstream(1|truncate_stream(inputs))' |
jq -cr '.statename, {
fipscode,
level,
polid,
polnum,
precinctsreporting,
precinctsreportingpct,
precinctstotal,
raceid,
runoff,
statepostal,
votecount,
votepct,
winner
}' | while read -r statename && read -r object
do
echo "$object" >> "$statename.json"
done
请注意,这会将对象附加到任何现有的“$statename.json”文件中。
使用您的 [原始] 示例数据,以上生成 Arizona.json、Florida.json 和 Oklahoma.json
调整
如果使用 echo
的开销是个问题,那么您可以使用 awk
:
awk '
fn!="" {print > fn; fn=""; next}
{fn=[=11=] ".json";
if (fns[fn]!=1){fns[fn]=1; print fn > "filenames.txt"}}'
大结局
由于您希望这些文件包含对象数组,因此您可以使用 jq -s
来获得最终结果。我可能会在 while
循环中收集文件名(天真地,例如 echo "$statename.json" >> filenames.txt
),然后使用 sponge
:
sort -u filenames.txt |
while read -r fn ; do
jq -s . "$fn" | sponge "$fn"
done
我在主 JSON 文件中存储了大量对象。我想遍历该数组,获取每个对象,并根据对象中的字段(在本例中为州名称)将其附加到新文件。换句话说,在一组包含许多状态的数据中,我想将其过滤出每个状态的文件。
我正在使用现有的 JQ 表达式来仅过滤我实际需要的数据:
{ fipscode: .fipscode, level: .level, polid: .polid, polnum: .polnum, precinctsreporting: .precinctsreporting, precinctsreportingpct: .precinctsreportingpct, precinctstotal: .precinctstotal, raceid: .raceid, runoff: .runoff, statepostal: .statepostal, votecount: .votecount, votepct: .votepct, winner: .winner }
这是我输入的示例:
[
{ "ballotorder": 2, "candidateid": "9718", "delegatecount": 0, "description": null, "electiondate": "2018-08-28", "electtotal": 0, "electwon": 0, "fipscode": null, "first": "Doug", "id": "3015-polid-64364-state-AZ-1", "incumbent": true, "initialization_data": false, "is_ballot_measure": false, "last": "Ducey", "lastupdated": "2018-08-30T00:01:38.897Z", "level": "state", "national": true, "officeid": "G", "officename": "Governor", "party": "GOP", "polid": "64364", "polnum": "5554", "precinctsreporting": 1488, "precinctsreportingpct": 0.9993000000000001, "precinctstotal": 1489, "raceid": "3015", "racetype": "Primary", "racetypeid": "R", "reportingunitid": "state-AZ-1", "reportingunitname": null, "runoff": false, "seatname": null, "seatnum": null, "statename": "Arizona", "statepostal": "AZ", "test": false, "uncontested": false, "votecount": 355455, "votepct": 0.705493, "winner": true },
{ "ballotorder": 2, "candidateid": "21689", "delegatecount": 0, "description": null, "electiondate": "2018-08-28", "electtotal": 0, "electwon": 0, "fipscode": null, "first": "Ron", "id": "10046-polid-62557-state-FL-1", "incumbent": false, "initialization_data": false, "is_ballot_measure": false, "last": "DeSantis", "lastupdated": "2018-08-29T19:29:50.367Z", "level": "state", "national": true, "officeid": "G", "officename": "Governor", "party": "GOP", "polid": "62557", "polnum": "13918", "precinctsreporting": 5968, "precinctsreportingpct": 1.0, "precinctstotal": 5968, "raceid": "10046", "racetype": "Primary", "racetypeid": "R", "reportingunitid": "state-FL-1", "reportingunitname": null, "runoff": false, "seatname": null, "seatnum": null, "statename": "Florida", "statepostal": "FL", "test": false, "uncontested": false, "votecount": 913997, "votepct": 0.564728, "winner": true },
{ "ballotorder": 2, "candidateid": "45555", "delegatecount": 0, "description": null, "electiondate": "2018-08-28", "electtotal": 0, "electwon": 0, "fipscode": null, "first": "Rex", "id": "38538-polid-67011-state-OK-1", "incumbent": false, "initialization_data": false, "is_ballot_measure": false, "last": "Lawhorn", "lastupdated": "2018-08-29T02:44:44.610Z", "level": "state", "national": true, "officeid": "G", "officename": "Governor", "party": "Lib", "polid": "67011", "polnum": "40784", "precinctsreporting": 1951, "precinctsreportingpct": 1.0, "precinctstotal": 1951, "raceid": "38538", "racetype": "Runoff", "racetypeid": "L", "reportingunitid": "state-OK-1", "reportingunitname": null, "runoff": false, "seatname": null, "seatnum": null, "statename": "Oklahoma", "statepostal": "OK", "test": false, "uncontested": false, "votecount": 379, "votepct": 0.409287, "winner": false }
]
作为输出,我希望有一个 Arizona.json
仅包含来自该州的项目,并且还过滤以删除不需要的字段:
[
{ "fipscode": null, "level": "state", "polid": "64364", "polnum": "5554", "precinctsreporting": 1488, "precinctsreportingpct": 0.9993000000000001, "precinctstotal": 1489, "raceid": "3015", "runoff": false, "statepostal": "AZ", "votecount": 355455, "votepct": 0.705493, "winner": true }
]
...其他相关州也是如此(Florida.json
和 Oklahoma.json
)。
这是我目前拥有的 bash 和 jq 脚本:
cat master.json |
jq -cn --stream 'fromstream(1|truncate_stream(inputs))' |
jq -c '.statename as $state | {
fipscode: .fipscode,
level: .level,
polid: .polid,
polnum: .polnum,
precinctsreporting: .precinctsreporting,
precinctsreportingpct: .precinctsreportingpct,
precinctstotal: .precinctstotal,
raceid: .raceid,
runoff: .runoff,
statepostal: .statepostal,
votecount: .votecount,
votepct: .votepct,
winner: .winner
}'
我想不通的是如何截取每一行,以便确定输出的位置。这可能吗?
您可以通过 jq
的一个副本从输入文件中分离出数据项,然后 每个州 的另一个实例将这些数据项整理在一起,使用bash 提供胶水。请参阅以下示例,适用于 bash 4.2 或更新版本(可能适用于 4.1,我需要检查一下)。
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*|4.[01].*) echo "ERROR: Bash 4.2 required" >&2; exit 1;; esac
input_file=
[[ -s $input_file ]] || { echo "Usage: ${0##*/} input-file" >&2; exit 1; }
jq_split_script='
# modify this function to fit your needs
def relevantContentOnly:
{ fipscode, level, polid, polnum, precinctsreporting, precinctsreportingpct, precinctstotal, raceid, runoff, statepostal, votecount, votepct, winner };
.[] | [.statename, (relevantContentOnly | tojson)] | @tsv
'
# Use an associative array to map from state names to output FDs
declare -A out_fds=( )
# Read state / line-of-data pairs from our JQ script...
while IFS=$'\t' read -r state data; do
# If we don't already have a writer for the current state, start one.
if [[ ! ${out_fds[$state]} ]]; then
exec {new_fd}> >(jq -n '[inputs]' >"$state.json")
out_fds[$state]=$new_fd
fi
# Regardless, send the data to the FD we have for this state
printf '%s\n' "$data" >&${out_fds[$state]}
done < <(jq -rc "$jq_split_script" <"$input_file") # ...running the JQ script above.
# close output FDs, so the JQ instances all flush
for fd in "${!out_fds[@]}"; do
exec {fd}>&-
done
这是一个简单的解决方案,搭载了您开始的内容:
< master.json jq -cn --stream 'fromstream(1|truncate_stream(inputs))' |
jq -cr '.statename, {
fipscode,
level,
polid,
polnum,
precinctsreporting,
precinctsreportingpct,
precinctstotal,
raceid,
runoff,
statepostal,
votecount,
votepct,
winner
}' | while read -r statename && read -r object
do
echo "$object" >> "$statename.json"
done
请注意,这会将对象附加到任何现有的“$statename.json”文件中。
使用您的 [原始] 示例数据,以上生成 Arizona.json、Florida.json 和 Oklahoma.json
调整
如果使用 echo
的开销是个问题,那么您可以使用 awk
:
awk '
fn!="" {print > fn; fn=""; next}
{fn=[=11=] ".json";
if (fns[fn]!=1){fns[fn]=1; print fn > "filenames.txt"}}'
大结局
由于您希望这些文件包含对象数组,因此您可以使用 jq -s
来获得最终结果。我可能会在 while
循环中收集文件名(天真地,例如 echo "$statename.json" >> filenames.txt
),然后使用 sponge
:
sort -u filenames.txt |
while read -r fn ; do
jq -s . "$fn" | sponge "$fn"
done