AWK

Question

我有以下输入文件：

Unit1 15 00:20:58
Unit1 30 01:10:00
Unit3 10 00:20:15
Unit2 5  00:45:00
Unit3 20 00:30:00
Unit2 2  01:22:35
Unit2 3  01:35:22
Unit1 5  00:58:20

有关此输入文件的一些背景信息。这是我负责分析的电子门户的工作单元列表。在日志文件中，它提供了单元名称 (</code>) 以及学生在点击提交之前完成的问题总数 (<code>)，它记录了时间 (</code>) ，进行了调整以提供更清晰的示例。</p> <p>我想输出以下内容：</p> <pre><code>Unit1 --------------------- 00 ======== 20 -------- 01 ======== 30 -------- Unit2 --------------------- 00 ======== 5 -------- 01 ======== 5 -------- Unit3 --------------------- 00 ======== 30 --------

我目前的代码如下：

#!/usr/bin/gawk -f

{ #Start of MID
        key =  #Message Extracted 10 Total
        key2 = substr(,1,2) #Hour
        MSG_TYPE[key]++ #Distinct Message
        HOUR_AR[key2]++
        HT_AR[key2] +=  #Tots up the total for each message by hour

} #End of MID
END {
                for (MSG in MSG_TYPE) {
                        print MSG
                        print "-----------------------------------"
                n=asorti(HOUR_AR, HOUR_SOR)
                for (i = 1; i <= n; i++) {
                            print HOUR_SOR[i]
                            print "========="
                            print HOUR_AR[HOUR_SOR[i]]
                            print "---------"
                            }
                            print "\n"
                    }
    } #End of END

此代码背后的逻辑是它从 </code> 和 <code>MSG_TYPE[] 获取所有唯一值。然后在 for 循环中扫描并打印出每个值。小时由 HOUR_AR[] 数组收集并排序，然后对于 MSG for 循环的每次传递 returns，希望是特定 [=20] 的所有小时=] 然后它打印那个小时 AND MSG 的总和。

对不起，这是啰嗦。只是想提供足够的细节。非常感谢任何帮助。

Answer 1

对于给定的示例，此代码给出了您预期的输出：

 awk -F'[ :]+' '{u[][]+=}
     END{for(i in u){
            print i;print "--------";
            for(j in u[i])
               print j"\n====\n"u[i][j]"\n---"}}' file

它输出：

Unit1
--------
00
====
20
---
01
====
30
---
Unit2
--------
00
====
5
---
01
====
5
---
Unit3
--------
00
====
30
---

注意排序部分不是在代码中完成的。但是你明白了，如果你使用gnu awk的array of array，你可以使实现更容易。

https://www.gnu.org/software/gawk/manual/html_node/Arrays-of-Arrays.html#Arrays-of-Arrays

AWK - 使用数组按小时和唯一值计数

AWK - Using arrays to count by hour and unique value

gawk