jq:从一个 json 输入,使用针对键的表达式构造多行 tsv?
jq: from one json input, construct multiple rows of tsv using an expression against the keys?
使用 jq
我可以通过如下简单的方式提取数据:
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_A_Foo.value, .data.Item_A_Bar.value] | @tsv' >> foobar.tsv
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_B_Foo.value, .data.Item_B_Bar.value] | @tsv' >> foobar.tsv
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_B_Foo.value, .data.Item_B_Bar.value] | @tsv' >> foobar.tsv
...
# and so on
但这似乎很浪费。有没有更高级的JQ使用方法,或许:
- 过滤
.data.Item_*_Foo.value, .data.Item_*_Bar.value
- OR 将这些行链接在一个
jq
表达式中(合理可读,紧凑)
# Here is a made up JSON file that can motivate this question.
# Imagine there are 100,000 of these and they are larger.
{
"data":
{
"Item_A_Foo": {
"adj": "wild",
"adv": "unruly",
"value": "unknown"
},
"Item_A_Bar": {
"adj": "rotund",
"quality": "mighty",
"value": "swing"
},
"Item_B_Foo": {
"adj": "nice",
"adv": "heroically",
"value": "medium"
},
... etc. for many Foo's and Bar's of A, B, C, ..., Z types
"Not_an_Item": {
"value": "doesn't matter"
}
}
目标是:
unknown, swing # data.Item_A_Foo.value, data.Item_A_Bar.value
medium, hit # data.Item_B_Foo.value, data.Item_B_Bar.value
whatever, etc. # data.Item_C_Foo.value, data.Item_C_Bar.value
您的要求细节不清楚,但您可以按照此 jq 过滤器建议的方式继续操作:
.data
| (keys_unsorted|map(select(test("^Item_[^_]*_Foo$")))) as $foos
| ($foos | map(sub("_Foo$"; "_Bar"))) as $bars
| [ .[$foos[]].value, .[$bars[]].value]
| @tsv
我们的想法是动态确定 select 的哪些键。
使用 jq
我可以通过如下简单的方式提取数据:
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_A_Foo.value, .data.Item_A_Bar.value] | @tsv' >> foobar.tsv
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_B_Foo.value, .data.Item_B_Bar.value] | @tsv' >> foobar.tsv
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_B_Foo.value, .data.Item_B_Bar.value] | @tsv' >> foobar.tsv
...
# and so on
但这似乎很浪费。有没有更高级的JQ使用方法,或许:
- 过滤
.data.Item_*_Foo.value, .data.Item_*_Bar.value
- OR 将这些行链接在一个
jq
表达式中(合理可读,紧凑)
# Here is a made up JSON file that can motivate this question.
# Imagine there are 100,000 of these and they are larger.
{
"data":
{
"Item_A_Foo": {
"adj": "wild",
"adv": "unruly",
"value": "unknown"
},
"Item_A_Bar": {
"adj": "rotund",
"quality": "mighty",
"value": "swing"
},
"Item_B_Foo": {
"adj": "nice",
"adv": "heroically",
"value": "medium"
},
... etc. for many Foo's and Bar's of A, B, C, ..., Z types
"Not_an_Item": {
"value": "doesn't matter"
}
}
目标是:
unknown, swing # data.Item_A_Foo.value, data.Item_A_Bar.value
medium, hit # data.Item_B_Foo.value, data.Item_B_Bar.value
whatever, etc. # data.Item_C_Foo.value, data.Item_C_Bar.value
您的要求细节不清楚,但您可以按照此 jq 过滤器建议的方式继续操作:
.data
| (keys_unsorted|map(select(test("^Item_[^_]*_Foo$")))) as $foos
| ($foos | map(sub("_Foo$"; "_Bar"))) as $bars
| [ .[$foos[]].value, .[$bars[]].value]
| @tsv
我们的想法是动态确定 select 的哪些键。