Awk:计算每列中负值的出现次数并转置 CSV
Awk: Count occurrences of negative values in each column and transpose CSV
我正在尝试编写一个 awk 脚本来创建列中具有负值的国家/地区列表并计算它们:
示例数据:
COUNTRY NAME, SOCIAL SUPPORT, FREEDOM TO MAKE LIFE CHOICES, GENEROSITY, PERCEPTIONS OF CORRUPTION, POSITIVE AFFECT, NEGATIVE AFFECT, CONFIDENCE IN NATIONAL GOVERNMENT, DEMOCRATIC QUALITY, DELIVERY QUALITY
Afghanistan, 0.49, NULL, -0.11, 0.95, 0.49, 0.37, -0.26, -1.88, -1.43
Albania, 0.63, NULL, -0.03, 0.87, 0.66, 0.33, -0.45, 0.29, -0.13
Algeria, 0.80, NULL, -0.19, 0.69, 0.64, 0.34, 0.24, -0.92, -0.81
Argentina, 0.90, NULL, -0.18, 0.84, 0.80, 0.29, 0.30, 0.35, 0.15
期望输出:
4 FREEDOM TO MAKE LIFE CHOICES: Afghanistan, Albania, Algeria, Argentina
4 GENEROSITY: Afghanistan, Albania, Algeria, Argentina
3 DELIVERY QUALITY: Afghanistan, Albania, Algeria
2 CONFIDENCE IN NATIONAL GOVERNMENT: Afghanistan, Albania
2 DEMOCRATIC QUALITY: Afghanistan, Algeria
我的脚本(基于之前的answer from icarus on U&L):
#!/usr/bin/awk -f
BEGIN { FS="," }
NR==1 { for(i=2;i<=NF;i++) { name[i]=$i } ; next }
{
for(i=2;i<=NF;i++) {
v=$i+0
if (v>0) continue;
n=name[i]
cnt[n]++
cl[n] = cl[n] ","
}
}
END { for (i in name) {
n=name[i]
printf("%-2d %s: %s\n",cnt[n]+0, n, cl[n] );}}
我的脚本不仅计算负值,还考虑了 NULL 和 0。
我想按计数对输出进行排序,但不知道如何在 awk 脚本的 END 语句中进行排序。
有什么想法吗?
如果您可以使用 GNU awk,则可以 control array traversal 设置 PROCINFO["sorted_in"]
:
#!gawk
BEGIN {FS = OFS = ", "}
NR == 1 {
for (i = 2; i <= NF; i++) quality[i] = $i
next
}
{
for (i = 2; i <= NF; i++) {
if ($i + 0 <= 0) {
countries[i] = countries[i] OFS
count[i]++
}
}
}
END {
PROCINFO["sorted_in"] = "@val_num_desc"
for (i in count) {
printf "%d %s: %s\n", count[i], quality[i], gensub(OFS, "", 1, countries[i])
}
}
然后
gawk -f script.gawk file.csv
产出
4 FREEDOM TO MAKE LIFE CHOICES: Afghanistan, Albania, Algeria, Argentina
4 GENEROSITY: Afghanistan, Albania, Algeria, Argentina
3 DELIVERY QUALITY: Afghanistan, Albania, Algeria
2 CONFIDENCE IN NATIONAL GOVERNMENT: Afghanistan, Albania
2 DEMOCRATIC QUALITY: Afghanistan, Algeria
我正在尝试编写一个 awk 脚本来创建列中具有负值的国家/地区列表并计算它们:
示例数据:
COUNTRY NAME, SOCIAL SUPPORT, FREEDOM TO MAKE LIFE CHOICES, GENEROSITY, PERCEPTIONS OF CORRUPTION, POSITIVE AFFECT, NEGATIVE AFFECT, CONFIDENCE IN NATIONAL GOVERNMENT, DEMOCRATIC QUALITY, DELIVERY QUALITY
Afghanistan, 0.49, NULL, -0.11, 0.95, 0.49, 0.37, -0.26, -1.88, -1.43
Albania, 0.63, NULL, -0.03, 0.87, 0.66, 0.33, -0.45, 0.29, -0.13
Algeria, 0.80, NULL, -0.19, 0.69, 0.64, 0.34, 0.24, -0.92, -0.81
Argentina, 0.90, NULL, -0.18, 0.84, 0.80, 0.29, 0.30, 0.35, 0.15
期望输出:
4 FREEDOM TO MAKE LIFE CHOICES: Afghanistan, Albania, Algeria, Argentina
4 GENEROSITY: Afghanistan, Albania, Algeria, Argentina
3 DELIVERY QUALITY: Afghanistan, Albania, Algeria
2 CONFIDENCE IN NATIONAL GOVERNMENT: Afghanistan, Albania
2 DEMOCRATIC QUALITY: Afghanistan, Algeria
我的脚本(基于之前的answer from icarus on U&L):
#!/usr/bin/awk -f
BEGIN { FS="," }
NR==1 { for(i=2;i<=NF;i++) { name[i]=$i } ; next }
{
for(i=2;i<=NF;i++) {
v=$i+0
if (v>0) continue;
n=name[i]
cnt[n]++
cl[n] = cl[n] ","
}
}
END { for (i in name) {
n=name[i]
printf("%-2d %s: %s\n",cnt[n]+0, n, cl[n] );}}
我的脚本不仅计算负值,还考虑了 NULL 和 0。
我想按计数对输出进行排序,但不知道如何在 awk 脚本的 END 语句中进行排序。
有什么想法吗?
如果您可以使用 GNU awk,则可以 control array traversal 设置 PROCINFO["sorted_in"]
:
#!gawk
BEGIN {FS = OFS = ", "}
NR == 1 {
for (i = 2; i <= NF; i++) quality[i] = $i
next
}
{
for (i = 2; i <= NF; i++) {
if ($i + 0 <= 0) {
countries[i] = countries[i] OFS
count[i]++
}
}
}
END {
PROCINFO["sorted_in"] = "@val_num_desc"
for (i in count) {
printf "%d %s: %s\n", count[i], quality[i], gensub(OFS, "", 1, countries[i])
}
}
然后
gawk -f script.gawk file.csv
产出
4 FREEDOM TO MAKE LIFE CHOICES: Afghanistan, Albania, Algeria, Argentina
4 GENEROSITY: Afghanistan, Albania, Algeria, Argentina
3 DELIVERY QUALITY: Afghanistan, Albania, Algeria
2 CONFIDENCE IN NATIONAL GOVERNMENT: Afghanistan, Albania
2 DEMOCRATIC QUALITY: Afghanistan, Algeria