如何在awk捕获的单元格内排序

How to sort inside a cell captured by awk

我有一个包含如下行的文件,其中第 3 列有多个我需要排序的数值:

文件:h1.csv

Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3348-1-27681 3347-1-28453
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3345-3-25691 3343-3-25314
Class S101-T2;3343-4-25310;3345-4-25691 3343-4-25314 3344-4-25314
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691

因此,预期输出为:

Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310;3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691

我的想法是用 awk 捕获第 3 列,然后对其进行排序,最后打印输出,但我到这里只是为了捕获该列。我没有成功排序,也没有打印出不想要的输出。

这是我目前得到的代码...

cat h1.csv | awk -F';' '{ gsub(" ","\n",); print [=10=] }'

我试过了(还有一些人给出了错误):

cat h1.csv | awk -F';' '{ gsub(" ","\n",); print  | "sort -u" }'
cat h1.csv | awk -F';' '{ gsub(" ","\n",); sort -u; print  }'

所以,是否可以这样做,怎么做?,任何帮助!谢谢...

一个选项可能是在 space 上拆分第 3 列,然后使用 asort() 作为使用 gnu-awk 的值。

然后再次连接前 2 个字段和拆分排序的字段。

awk '
BEGIN{FS=OFS=";"}
{
  n=split(, a, " ")
  asort(a)
  res =  OFS  OFS
  for (i = 1; i <= n; i++) {
    res = res " " a[i]
  }
  print res
}' file

输出

Class S101-T1;3343-1-25310; 3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310; 3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310; 3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310; 3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310; 3344-5-25446 3345-5-25691

在 GNU awk 中,使用您显示的示例,请尝试以下 awk 代码。

awk '
BEGIN{
  FS=OFS=";"
  PROCINFO["sorted_in"] = "@val_num_asc"
}
{
  nf=val=""
  delete value
  num=split($NF,arr," ")
  for(i=1;i<=num;i++){
    split(arr[i],arr2,"-")
    value[arr2[1]]=arr[i]
  }
  for(i in value){
    nf=(nf?nf " ":"")value[i]
  }
  $NF=nf
}
1
'  Input_file

解释:为以上添加详细解释。

awk '                                     ##Starting awk program from here.
BEGIN{                                    ##Starting BEGIN section from here.
  FS=OFS=";"                              ##Setting FS, OFS as ; here.
  PROCINFO["sorted_in"] = "@val_num_asc"  ##Setting PROCINFO using sorted_in to make sure array values are sorted by values in ascending order only.
}
{
  nf=val=""                               ##Nullifying variables here.
  delete value                            ##Deleting value array here.
  num=split($NF,arr," ")                  ##Splitting last field into arr with separator as space here.
  for(i=1;i<=num;i++){                    ##Traversing through all elements of array arr.
    split(arr[i],arr2,"-")                ##Splitting first value of arr into arr2 by delimiter of - to make sure to get only first value eg: 3344, 3345 etc.
    value[arr2[1]]=arr[i]                 ##Assigning value array value to arr value with index of arr2 value whose index of 1st.
  }
  for(i in value){                        ##Traversing through array value here.
    nf=(nf?nf " ":"")value[i]             ##Concatenating all values to nf here.
  }
  $NF=nf                                  ##Assigning last field value to nf here.
}
1                                         ##printing edited/non-edited line here.
'  Input_file                             ##Mentioning Input_file name here.

使用 GNU awk sorted_in:

$ cat tst.awk
BEGIN {
    FS = OFS = ";"
    PROCINFO["sorted_in"] = "@val_str_asc"
}
{
    split(,a," ")
    sorted = ""
    for (i in a) {
        sorted = (sorted=="" ? "" : sorted " ") a[i]
    }
     = sorted
    print
}

$ awk -f tst.awk file
Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310;3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691

请注意,这假定按字母顺序排序,因此它会在 200-1-1 之前排序 1000-1-1。只要您要排序的字符串始终由相同长度的部分组成,即 4digits-1digit-5digits,这就有效。