通过收集(子)实例的计数将 awk 用于报告

Using awk for a report by gathering counts of (sub)instances

对于这个数据集(data.csv;实际上有几百行)

输入数据

mig|Lecture|12.00
mig|Other|1.681
mige|Research|20.026
mige|Other|4.32
mige|Lecture|0.120
migc|Research|12.83
migc|Lecture|2.170
migc|Other|70.719
done|Research|24.794
done|Lecture|23.123
done|Other|9.96
done|NoMigration|6.9
mig|Research|5.4
md|Required|0.169
md|Required|0.02
mdc|NoMigration|0.122
mdc|Research|0.019
md|Required|2.12
mdc|Research|1.23
mdc|Other|18.53
mdc|Other|2.08
mdc|Lecture|2.5

我想获得包含“状态”、“类别”、“节点”、“配额”列的报告。

数据字典

输出错误

目前我得到了这个

done|Lecture|4|64.777
mdc|Lecture|6|24.481
md|Lecture|3|2.309
migc|Lecture|3|85.719
mige|Lecture|3|24.466
mig|Lecture|3|19.081

awk代码

这是 awk 片段:

awk 'BEGIN {FS="|";OFS="|" }{
           nodes[]++;     # Increment count of lines.
           quota[] += ; # Accumulate sum of second column.
        }
            END{for (x in nodes) {
        printf("%s|%s|%.f|%.3f\n",x, , nodes[x], quota[x]) | "sort";}}' data.csv

问题是根据status得到categories....

期望的输出

所需的输出应如下所示 它应该看起来像这样(缩写):

done|Research|1|24.794
done|Lecture|1|23.123
done|Other|1|9.96
done|NoMigration|1|6.9
md|Required|3|2.309
mdc|NoMigration|1|0.122
mdc|Research|2|1.249
mdc|Other|2|20.61
mdc|Lecture|1|2.5
mig|Lecture|1|12
mig|Other|1|1.681
mig|Research|1|5.4
migc|Research|1|12.83
migc|Lecture|1|2.17
migc|Other|1|70.719
mige|Research|1|20.026
mige|Other|1|4.32
mige|Lecture|1|0.12

您可以使用 multidimensional array nodes[,] 并在 END 部分打印值。

awk 'BEGIN {FS="|";OFS="|"}
{
  nodes[,] += 
  quota[,]++
}
END { 
  for (i in quota) {
    split(i, val, SUBSEP)
    print val[1] OFS val[2] OFS quota[i] OFS nodes[i] | "sort"
  }
}
' data.csv

输出

done|Lecture|1|23.123
done|NoMigration|1|6.9
done|Other|1|9.96
done|Research|1|24.794
mdc|Lecture|1|2.5
mdc|NoMigration|1|0.122
mdc|Other|2|20.61
mdc|Research|2|1.249
md|Required|3|2.309
migc|Lecture|1|2.17
migc|Other|1|70.719
migc|Research|1|12.83
mige|Lecture|1|0.12
mige|Other|1|4.32
mige|Research|1|20.026
mig|Lecture|1|12
mig|Other|1|1.681
mig|Research|1|5.4