使用 awk 使用最后一列按字母顺序排序

Question

我正在尝试对可变数量的文本列进行排序，有时有 3 个字段，有时有 2 个字段。

示例输入：

        George W. Bush
        Brack Obama
        Micky Mouse
        John F. Kennedy

想要的结果：

         George W. Bush
         John F. Kennedy
         Micky Mouse
         Brack Obama

我想按姓氏的字母顺序获取它们，因此使用 </code> 或 <code> 字段。

到目前为止，我已经翻转了每一行，让姓氏在前面。但是，要对它们进行排序，我似乎无法将它们翻转回去。我试过数组，我得到了比预期更多的输出（重复）。

我只想将其保存为 awk 文件。

我考虑过使用另一个 awk 文件将它们翻转回（比方说）awk 文件脚本，但我无法在 awk 中创建文件（使用 bash 脚本） .我一直在阅读 A Practical Guide to Linux，但我看到的示例看起来都一样。感谢您审阅我的问题。

目前我就是这样完成的

    {
         #print   " "  " " ;
         if( == ""){
            #print "me";
            print   " " ;
            #list[]= "  "
        }else{ 
            print " "" " ;
            #list[]= " " " ";}
            #for(result in list){    print list[result];   }
        }
    }


    gawk -f fileUsed alphRecoredToBeUsed | sort

给我留下了按我想要的方式排序的范围值。然而，在保持 alpha 顺序的同时向它们呈现第一个原始值。

Answer 1

我最喜欢的 awk 变量之一是 NF，它是记录中的字段数；意思是 </code> <code>... $NF 的数量，其中 $NF 是您的最后一个元素。你甚至可以做 print $(NF-1) 让 awk 打印你的 second 到最后一个元素，或者如果你找到的话，用那个 $(integer-after-math) 符号做任何其他数学那个需要。

与其尝试交换所有内容，不如根据 $NF 组织它们，这是您的数据示例中每一行的姓氏。

Answer 2

这是使用 gawk 根据每行最后一个单词排序的脚本：

#!/bin/sh
gawk '
function compare(i1, v1, i2, v2) {
    ct1 = split(v1, pcs1)
    ct2 = split(v2, pcs2)
    f1 = ct1 < 1 ? "" : pcs1[ct1]
    f2 = ct2 < 1 ? "" : pcs2[ct2]
    if (f1 < f2) return -1;
    if (f1 > f2) return 1;
    return 0
}
{ lines[++ct] = [=10=] }
END {
    asort(lines, sorted_lines, "compare");
    for (i = 1; i <= length(sorted_lines); i++)
        print sorted_lines[i]
}
' "$@"

它适用于您的示例：

$ cat input
George W. Bush
Brack Obama
Micky Mouse
John F. Kennedy
$ ./s input
George W. Bush
John F. Kennedy
Micky Mouse
Brack Obama

（我使用的是 gawk 4.0.1，它支持用户提供的比较功能。）

Answer 3

这里是一行 awk 命令以获得所需的输出，

$ awk '{a[$NF]=[=10=]} END{PROCINFO["sorted_in"]="@ind_str_asc"; for(i in a)print a[i]}' file
        George W. Bush
        John F. Kennedy
        Micky Mouse
        Brack Obama

简要说明，

使用数组 a[$NF]=[=12=] 创建 $NF 和 [=14=] 映射。
PROCINFO["sorted_in"]="@ind_str_asc"：与字符串相比，按索引升序排列。参考 awk manual 了解更多详情。请注意，它特定于 gawk.
for(i in a)print a[i]：由于之前预定义的数组扫描顺序，数组会按照升序扫描。

Answer 4

您需要订购所有个字段才有价值。

一行：

$ awk '{s="";for (i=1;i<NF;i++)s=s $i;a[s]=[=10=]}END{n=asorti(a,b);for(j=1;j<=n;j++)print a[b[j]]}' input.txt

解释：

{
  s=""                                 # initialize s
  for (i=1;i<NF;i++) s=s $i            # concatenate first and middle names
  a[$NF s]=[=11=]                          # use last name followed by other names 
                                       # as index
}
END{
  n=asorti(a,b);                       # sort index of a
  for(j=1;j<=n;j++) print a[b[j]]      # print results
}

使用此输入：

$ cat input.txt
George W. Bush
George H.W. Bush
Michelle Obama
Barack Obama
Micky Mouse
John F. Kennedy

给出：

$ awk '{s="";for (i=1;i<NF;i++)s=s $i;a[$NF s]=[=13=]}END{n=asorti(a,b);for(j=1;j<=n;j++)print a[b[j]]}' input.txt
George H.W. Bush
George W. Bush
John F. Kennedy
Micky Mouse
Barack Obama
Michelle Obama

从 gnu awk 4.1 开始，您可以使用 join 函数：

@include "join"
{
  n=split([=14=], a, " ")
  s=join(a, 1, n-1)
  b[$NF s]=[=14=]
}
END{
  n=asorti(b,c);
  for(j=1;j<=n;j++) print b[c[j]]
}

Answer 5

在 GNU awk 中：

$ awk '
{
    b=$NF                 # initialize the key buffer
    if(NF>1)              # if there are more than one word in the name
        for(i=1;i<NF;i++) # add them to the buffer
            b=b OFS $i
    a[b]=[=10=]               # hash
}
END{
    PROCINFO["sorted_in"]="@ind_str_asc"  # order on the index using for
    for(i in a)
        print a[i]
}' file

输出（将一些常见的嫌疑人添加到列表中进行测试）：

George H. W. Bush
George W. Bush
John F. Kennedy
John G. Kennedy
Madonna
Micky Mouse
Barack Obama
Brack Obama

作为散列脚本的密钥使用 lastname firstname_if_exists 1st_middle_if_exists 等，即。 a["Bush George H. W."]="George H. W. Bush".

Answer 6

使用 GNU awk sorted_in:

$ awk '
    { a[$NF]=($NF in a ? a[$NF] ORS : "") [=10=] }
    END { PROCINFO["sorted_in"]="@ind_str_asc"; for (i in a) print a[i] }
' file
George W. Bush
John F. Kennedy
Micky Mouse
Brack Obama

或使用任何 awk + 排序 + 剪切：

$ awk '{print $NF "\t" [=11=]}' file | sort | cut -f2-
George W. Bush
John F. Kennedy
Micky Mouse
Brack Obama

Answer 7

这可能更容易：

sh-4.4$ awk '{print $NF,[=10=]}' file |sort -k1|awk '{="";print [=10=]}'                                                                                                                   
 George W. Bush                                                                                                                                                                      
 John F. Kennedy                                                                                                                                                                     
 Micky Mouse                                                                                                                                                                         
 Barack Obama

正在做的事情：将姓氏放在前面，排序，然后将其从输出中删除。

希望这对您有所帮助

使用 awk 使用最后一列按字母顺序排序

Sorting alphabetically using last column, using awk

bash

awk

gawk