格式和过滤文件为 Csv table

Question

我有一个包含很多日志的文件:

Ps：这个问题的灵感来自于之前的一个问题。但略有改善。

at 10:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR5> [STR6 STR7] STR8:
academy/course1:oftheory:SMTGHO:nothing:
academy/course1:ofapplicaton:SMTGHP:onehour:

at 10:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR78> [STR6 STR111] STR8:
academy/course2:oftheory:SMTGHM:math:
academy/course2:ofapplicaton:SMTGHN:twohour:

at 10:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR758> [STR6 STR155] STR8:
academy/course3:oftheory:SMTGHK:geo:
academy/course3:ofapplicaton:SMTGHL:halfhour:

at 10:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR87> [STR6 STR74] STR8:
academy/course4:oftheory:SMTGH:SMTGHI:history:
academy/course4:ofapplicaton:SMTGHJ:nothing:

at 14:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR11> [STR6 STR784] STR8:
academy/course5:oftheory:SMTGHG:nothing:
academy/course5:ofapplicaton:SMTGHH:twohours:

at 14:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR86> [STR6 STR85] STR8:
academy/course6:oftheory:SMTGHE:music:
academy/course6:ofapplicaton:SMTGHF:twohours:

at 14:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR96> [STR6 STR01] STR8:
academy/course7:oftheory:SMTGHC:programmation:
academy/course7:ofapplicaton:SMTGHD:onehours:

at 14:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR335> [STR6 STR66] STR8:
academy/course8:oftheory:SMTGHA:philosophy:
academy/course8:ofapplicaton:SMTGHB:nothing:

我尝试应用下面的代码但没有成功:

BEGIN {
    # set records separated by empty lines
    RS=""
    # set fields separated by newline, each record has 3 fields
    FS="\n"
}
{
    # remove undesired parts of every first line of a record
    sub("at ", "", )
    # now store the rest in time and course
    time=
    course=
    # remove time from string to extract the course title
    sub("^[^ ]* ", "", course)
    # remove course title to retrieve time from string
    sub(course, "", time)
    # get theory info from second line per record
    sub("course:theory:", "", )
    # get application info from third line
    sub("course:applicaton:", "", )
    # if new course
    if (! (course in header)) {
        # save header information (first words of each line in output)
        header[course] = course
        theory[course] = "theory"
        app[course] = "application"
    }
    # append the relevant info to the output strings
    header[course] = header[course] "," time
    theory[course] = theory[course] "," 
    app[course] = app[course] "," 

}
END {
    # now for each course found
    for (key in header) {
        # print the strings constructed
        print header[key]
        print theory[key]
        print app[key]
        print ""
}

有没有办法使用这些字符串 STR* 和 SMTGH* 以获得此输出：

carl 1,10:00,14:00
applicaton,halfhour,onehours
theory,geo,programmation

carl 2,10:00,14:00
applicaton,nothing,nothing
theory,history,philosophy

david 1,10:00,14:00
applicaton,onehour,twohours
theory,nothing,nothing

david 2,10:00,14:00
applicaton,twohour,twohours
theory,math,music

Answer 1

GNU awk

awk -F: -v OFS=, '
  /^at/ {
    split([=10=], f, " ")
    time = f[2]
    course = f[3] " " f[4]
    times[course] = times[course] OFS time
  }
   == "oftheory"     {th[course] = th[course] OFS $(NF-1)}
   == "ofapplicaton" {ap[course] = ap[course] OFS $(NF-1)}
  END {
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (c in times) {
      printf "%s%s\n", c, times[c]
      printf "application%s\n", ap[c]
      printf "theory%s\n", th[c]
      print ""
    }
  }
' file

carl 1,10:00,14:00
application,onehour,twohours
theory,nothing,nothing

carl 2,10:00,14:00
application,twohour,twohours
theory,math,music

david 1,10:00,14:00
application,halfhour,onehours
theory,geo,programmation

david 2,10:00,14:00
application,nothing,nothing
theory,history,philosophy

格式和过滤文件为 Csv table

format and filter file to Csv table

linux

bash

awk

zsh

filefilter