根据第一个的 属性 获取项目和后续项目
Get item and subsequent item based on a property of the first one
我有一个无法更改的第三方工具生成的事件日志文件。因此,这个日志文件是一个巨大的 JSON 数组,其中赔率元素包含元数据,而对包含与元数据关联的正文消息。我希望能够根据元数据拆分文件,在不同文件中按主题聚合信息。
我正在 windows 上做这个项目,我正在尝试使用批处理文件和 JQ。
数组基本上是这样的:
[
{ "type": "abc123"},
{"name":"first component of type abc123"},
{ "type": "abc123"},
{"name":"second component of type abc123"},
{ "type": "def124"},
{"name":"first component of type def124"},
{ "type": "xyz999"},
{"name":"first component of type xyz999"},
{ "type": "abc123"},
{"name":"third component of type abc123"},
{ "type": "def124"},
{"name":"second component of type def124"},
{ "type": "abc123"},
{"name":"fifth component of type abc123"},
{ "type": "abc123"},
{"name":"sixth component of type abc123"},
{ "type": "def124"},
{"name":"third component of type def124"},
{ "type": "def124"},
{"name":"fourth component of type def124"},
{ "type": "abc123"},
{"name":"seventh component of type abc123"},
{ "type": "xyz999"},
{"name":"second component of type xyz999"}
...
]
我知道我只有 3 种类型,所以我要归档的是为每种类型创建一个文件。类似于:
第一个文件
{
"componentLog": {
"type": "abc123",
"information": [
"first component of type abc123",
"second component of type abc123",
"third component of type abc123",
...
]
}
}
第二个文件
{
"componentLog": {
"type": "def124",
"information": [
"first component of type def124",
"second component of type def124",
"third component of type def124",
...
]
}
}
第三个文件
{
"componentLog": {
"type": "xyz999",
"information": [
"first component of type xyz999",
"second component of type xyz999",
"third component of type xyz999",
...
]
}
}
我知道我可以用这个分离元数据
jq.exe ".[] | select(.type==\"product\")" file.json
然后我尝试对 index
进行数学计算。但是索引只是 returns 包含 select 语句的第一个项目的索引...所以我不知道如何解决这个...
下面的 bash 脚本有点混乱,因为它假设 none 的文件(输入或输出)适合内存。
如果您还没有在您的计算环境中访问 bash、sed 和 awk,您可能需要考虑安装 wsl, mingw, or some such, or you could adapt the script as appropriate, e.g. using gawk for Windows, or Ruby for Windows.
原始问题中尚未包含的另一个主要假设是可以删除 log-type*.tmp
文件和
为 "type".
的各种值覆盖 log-TYPE.json
务必将 input
设置为适当的输入文件名。
# The input file name:
input=file.json
/bin/rm log-type*.tmp
# Use jq to produce a stream of .type and .name values
# as per the jq FAQ
jq -cn --stream '
fromstream(1|truncate_stream(inputs))
| if .type then .type else .name end' "$input" |
awk '
NR%2 {fn=; sub("^\"","",fn); sub("\"$","", fn); next;}
{ print > "log-type." fn ".tmp"}
'
for f in log-type.*.tmp ; do
echo formatting $f ...
g=$(sed -e 's/log-type.//' -e 's/.tmp$//' <<< "$f")
echo g="$g"
awk -v type="\"$g\"" '
BEGIN { print "{\"componentLog\": { \"type\": " type " ,";
print "\"information\": ["; }
NR==1 { print; next }
{print ",", [=10=]}
END {print "]}}"; }' "$f" > "log-$g.json"
done
我有一个无法更改的第三方工具生成的事件日志文件。因此,这个日志文件是一个巨大的 JSON 数组,其中赔率元素包含元数据,而对包含与元数据关联的正文消息。我希望能够根据元数据拆分文件,在不同文件中按主题聚合信息。
我正在 windows 上做这个项目,我正在尝试使用批处理文件和 JQ。
数组基本上是这样的:
[
{ "type": "abc123"},
{"name":"first component of type abc123"},
{ "type": "abc123"},
{"name":"second component of type abc123"},
{ "type": "def124"},
{"name":"first component of type def124"},
{ "type": "xyz999"},
{"name":"first component of type xyz999"},
{ "type": "abc123"},
{"name":"third component of type abc123"},
{ "type": "def124"},
{"name":"second component of type def124"},
{ "type": "abc123"},
{"name":"fifth component of type abc123"},
{ "type": "abc123"},
{"name":"sixth component of type abc123"},
{ "type": "def124"},
{"name":"third component of type def124"},
{ "type": "def124"},
{"name":"fourth component of type def124"},
{ "type": "abc123"},
{"name":"seventh component of type abc123"},
{ "type": "xyz999"},
{"name":"second component of type xyz999"}
...
]
我知道我只有 3 种类型,所以我要归档的是为每种类型创建一个文件。类似于:
第一个文件
{
"componentLog": {
"type": "abc123",
"information": [
"first component of type abc123",
"second component of type abc123",
"third component of type abc123",
...
]
}
}
第二个文件
{
"componentLog": {
"type": "def124",
"information": [
"first component of type def124",
"second component of type def124",
"third component of type def124",
...
]
}
}
第三个文件
{
"componentLog": {
"type": "xyz999",
"information": [
"first component of type xyz999",
"second component of type xyz999",
"third component of type xyz999",
...
]
}
}
我知道我可以用这个分离元数据
jq.exe ".[] | select(.type==\"product\")" file.json
然后我尝试对 index
进行数学计算。但是索引只是 returns 包含 select 语句的第一个项目的索引...所以我不知道如何解决这个...
下面的 bash 脚本有点混乱,因为它假设 none 的文件(输入或输出)适合内存。
如果您还没有在您的计算环境中访问 bash、sed 和 awk,您可能需要考虑安装 wsl, mingw, or some such, or you could adapt the script as appropriate, e.g. using gawk for Windows, or Ruby for Windows.
原始问题中尚未包含的另一个主要假设是可以删除 log-type*.tmp
文件和
为 "type".
务必将 input
设置为适当的输入文件名。
# The input file name:
input=file.json
/bin/rm log-type*.tmp
# Use jq to produce a stream of .type and .name values
# as per the jq FAQ
jq -cn --stream '
fromstream(1|truncate_stream(inputs))
| if .type then .type else .name end' "$input" |
awk '
NR%2 {fn=; sub("^\"","",fn); sub("\"$","", fn); next;}
{ print > "log-type." fn ".tmp"}
'
for f in log-type.*.tmp ; do
echo formatting $f ...
g=$(sed -e 's/log-type.//' -e 's/.tmp$//' <<< "$f")
echo g="$g"
awk -v type="\"$g\"" '
BEGIN { print "{\"componentLog\": { \"type\": " type " ,";
print "\"information\": ["; }
NR==1 { print; next }
{print ",", [=10=]}
END {print "]}}"; }' "$f" > "log-$g.json"
done