根据第一个的 属性 获取项目和后续项目

Get item and subsequent item based on a property of the first one

我有一个无法更改的第三方工具生成的事件日志文件。因此,这个日志文件是一个巨大的 JSON 数组,其中赔率元素包含元数据,而对包含与元数据关联的正文消息。我希望能够根据元数据拆分文件,在不同文件中按主题聚合信息。

我正在 windows 上做这个项目,我正在尝试使用批处理文件和 JQ。

数组基本上是这样的:

[
  { "type": "abc123"},
  {"name":"first component of type abc123"},
   { "type": "abc123"},
  {"name":"second component of type abc123"},
  { "type": "def124"},
  {"name":"first component of type def124"},
  { "type": "xyz999"},
  {"name":"first component of type xyz999"},
  { "type": "abc123"},
  {"name":"third component of type abc123"},
  { "type": "def124"},
  {"name":"second component of type def124"},
  { "type": "abc123"},
  {"name":"fifth component of type abc123"},
  { "type": "abc123"},
  {"name":"sixth component of type abc123"},
  { "type": "def124"},
  {"name":"third component of type def124"},
  { "type": "def124"},
  {"name":"fourth component of type def124"},
  { "type": "abc123"},
  {"name":"seventh component of type abc123"},
  { "type": "xyz999"},
  {"name":"second component of type xyz999"}
  ...
]

我知道我只有 3 种类型,所以我要归档的是为每种类型创建一个文件。类似于:

第一个文件

{
  "componentLog": {
       "type": "abc123",
       "information": [
          "first component of type abc123",
          "second component of type abc123",
          "third component of type abc123",
          ...
       ]
     }
}

第二个文件

{
  "componentLog": {
       "type": "def124",
       "information": [
          "first component of type def124",
          "second component of type def124",
          "third component of type def124",
          ...
       ]
     }
}

第三个文件

{
  "componentLog": {
       "type": "xyz999",
       "information": [
          "first component of type xyz999",
          "second component of type xyz999",
          "third component of type xyz999",
          ...
       ]
     }
}

我知道我可以用这个分离元数据

jq.exe ".[] | select(.type==\"product\")" file.json

然后我尝试对 index 进行数学计算。但是索引只是 returns 包含 select 语句的第一个项目的索引...所以我不知道如何解决这个...

下面的 bash 脚本有点混乱,因为它假设 none 的文件(输入或输出)适合内存。

如果您还没有在您的计算环境中访问 bash、sed 和 awk,您可能需要考虑安装 , , or some such, or you could adapt the script as appropriate, e.g. using gawk for Windows, or Ruby for Windows.

原始问题中尚未包含的另一个主要假设是可以删除 log-type*.tmp 文件和 为 "type".

的各种值覆盖 log-TYPE.json

务必将 input 设置为适当的输入文件名。

# The input file name:
input=file.json

/bin/rm log-type*.tmp

# Use jq to produce a stream of .type and .name values 
# as per the jq FAQ
jq -cn --stream '
   fromstream(1|truncate_stream(inputs))
   | if .type then .type else .name end'  "$input" |
 awk '
      NR%2 {fn=; sub("^\"","",fn); sub("\"$","", fn); next;} 
      { print > "log-type." fn ".tmp"}
'

for f in log-type.*.tmp ; do
    echo formatting $f ...
    g=$(sed -e 's/log-type.//' -e 's/.tmp$//' <<< "$f")
    echo g="$g"
    awk -v type="\"$g\"" '
      BEGIN { print "{\"componentLog\": { \"type\": " type " ,";
      print "\"information\": ["; }
      NR==1 { print; next }
      {print ",", [=10=]} 
      END {print "]}}"; }' "$f" > "log-$g.json"
done