在 CSV 文件中,基于第三列的小计 2 列,在 KSH 中使用 AWK

In a CSV file, subtotal 2 columns based on a third one, using AWK in KSH

免责声明:

    1) English is my second language, so please forgive any grammatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
    2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.

"Problem":

我有一个如下所示的 CSV 文件:

c1,c2,c3,c4,c5,134.6,,c8,c9,SERVER1,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER1,c11
c1,c2,c3,c4,c5,0.18,,c8,c9,SERVER2,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER2,c11
c1,c2,c3,c4,c5,416.09,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,12.1,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,480.64,,c8,c9,SERVER4,c11
c1,c2,c3,c4,c5,,83.65,c8,c9,SERVER5,c11
c1,c2,c3,c4,c5,,253.15,c8,c9,SERVER6,c11
c1,c2,c3,c4,c5,,18.84,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,8.12,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,22.45,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,117.81,c8,c9,SERVER8,c11
c1,c2,c3,c4,c5,,96.34,c8,c9,SERVER9,c11

补充事实:

    1) File has 11 columns.
    2) The data in columns 1, 2, 3, 4, 5, 8, 9 and 11 is irrelevant in this case. In other words, I will only work with columns 6, 7 and 10.
    3) Column 10 will be typically alphanumeric strings (server names), though it may contain also "-" and/or "_".
    4) Columns 6 and 7 will have exclusively numbers, with up to two decimal places (A possible value is 0). Only one of the two will have data per line, never both.

我需要的输出:

    - A single occurrence of every string in column 10 (as column 1), then the sum (subtotal) of it's values in column 6 (as column 2) and last, the sum (subtotal) of it's values in column 7 (as column 3).
    - If the total for a field is "0" the field must be left empty, but still must exist (it's respective comma has to be printed).
    - **Note** that the strings in column 10 will be already alphabetically sorted, so there is no need to do that part of the processing with AWK.

输出样本,使用上面的样本作为输入:

SERVER1,134.6,,
SERVER2,0.18,,
SERVER3,428.19,,
SERVER4,480.64,,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,26.96

我已经在这些页面中发现不是一个,而是两个 AWK oneliners,它们部分地完成了它所需要的:

awk -F "," 'NR==1{last=; sum=0;}{if (last != ) {print last "," sum; last=; sum=0;} sum += ;}END{print last "," sum;}' inputfile


awk -F, '{a[]+=;}END{for(i in a)print i","a[i];}' inputfile

我的"problems"两种情况都是一样的:

    - Subtotals of 0 are printed.
    - I can only handle the sum of one column at a time. Whenever I try to add the second one, I get either a syntax error or it does simply not print the third column at all.

在此先感谢您的支持! 问候, 马丁

是这样的吗?

$ awk 'BEGIN{FS=OFS=","} 
            {s6[]+=; s7[]+=} 
         END{for(k in s6) print k,(s6[k]?s6[k]:""),(s7[k]?s7[k]:"")}' file | sort

SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34

请注意,您对逗号的处理不一致,您在最后一个字段为零时添加了一个额外的逗号(算上逗号)

您发布的预期输出似乎与您发布的示例输入不匹配,所以我们猜测,但这可能是您要查找的内容:

$ cat tst.awk
BEGIN { FS=OFS="," }
 != prev {
    if (NR > 1) {
        print prev, sum6, sum7
    }
    sum6 = sum7 = ""
    prev = 
}
  { sum6 +=  }
  { sum7 +=  }
END { print prev, sum6, sum7 }

$ awk -f tst.awk file
SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34