如何在awk捕获的单元格内排序
How to sort inside a cell captured by awk
我有一个包含如下行的文件,其中第 3 列有多个我需要排序的数值:
文件:h1.csv
Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3348-1-27681 3347-1-28453
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3345-3-25691 3343-3-25314
Class S101-T2;3343-4-25310;3345-4-25691 3343-4-25314 3344-4-25314
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691
因此,预期输出为:
Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310;3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691
我的想法是用 awk 捕获第 3 列,然后对其进行排序,最后打印输出,但我到这里只是为了捕获该列。我没有成功排序,也没有打印出不想要的输出。
这是我目前得到的代码...
cat h1.csv | awk -F';' '{ gsub(" ","\n",); print [=10=] }'
我试过了(还有一些人给出了错误):
cat h1.csv | awk -F';' '{ gsub(" ","\n",); print | "sort -u" }'
cat h1.csv | awk -F';' '{ gsub(" ","\n",); sort -u; print }'
所以,是否可以这样做,怎么做?,任何帮助!谢谢...
一个选项可能是在 space 上拆分第 3 列,然后使用 asort()
作为使用 gnu-awk
的值。
然后再次连接前 2 个字段和拆分排序的字段。
awk '
BEGIN{FS=OFS=";"}
{
n=split(, a, " ")
asort(a)
res = OFS OFS
for (i = 1; i <= n; i++) {
res = res " " a[i]
}
print res
}' file
输出
Class S101-T1;3343-1-25310; 3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310; 3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310; 3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310; 3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310; 3344-5-25446 3345-5-25691
在 GNU awk
中,使用您显示的示例,请尝试以下 awk
代码。
awk '
BEGIN{
FS=OFS=";"
PROCINFO["sorted_in"] = "@val_num_asc"
}
{
nf=val=""
delete value
num=split($NF,arr," ")
for(i=1;i<=num;i++){
split(arr[i],arr2,"-")
value[arr2[1]]=arr[i]
}
for(i in value){
nf=(nf?nf " ":"")value[i]
}
$NF=nf
}
1
' Input_file
解释:为以上添加详细解释。
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS=";" ##Setting FS, OFS as ; here.
PROCINFO["sorted_in"] = "@val_num_asc" ##Setting PROCINFO using sorted_in to make sure array values are sorted by values in ascending order only.
}
{
nf=val="" ##Nullifying variables here.
delete value ##Deleting value array here.
num=split($NF,arr," ") ##Splitting last field into arr with separator as space here.
for(i=1;i<=num;i++){ ##Traversing through all elements of array arr.
split(arr[i],arr2,"-") ##Splitting first value of arr into arr2 by delimiter of - to make sure to get only first value eg: 3344, 3345 etc.
value[arr2[1]]=arr[i] ##Assigning value array value to arr value with index of arr2 value whose index of 1st.
}
for(i in value){ ##Traversing through array value here.
nf=(nf?nf " ":"")value[i] ##Concatenating all values to nf here.
}
$NF=nf ##Assigning last field value to nf here.
}
1 ##printing edited/non-edited line here.
' Input_file ##Mentioning Input_file name here.
使用 GNU awk sorted_in
:
$ cat tst.awk
BEGIN {
FS = OFS = ";"
PROCINFO["sorted_in"] = "@val_str_asc"
}
{
split(,a," ")
sorted = ""
for (i in a) {
sorted = (sorted=="" ? "" : sorted " ") a[i]
}
= sorted
print
}
$ awk -f tst.awk file
Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310;3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691
请注意,这假定按字母顺序排序,因此它会在 200-1-1
之前排序 1000-1-1
。只要您要排序的字符串始终由相同长度的部分组成,即 4digits-1digit-5digits,这就有效。
我有一个包含如下行的文件,其中第 3 列有多个我需要排序的数值:
文件:h1.csv
Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3348-1-27681 3347-1-28453
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3345-3-25691 3343-3-25314
Class S101-T2;3343-4-25310;3345-4-25691 3343-4-25314 3344-4-25314
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691
因此,预期输出为:
Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310;3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691
我的想法是用 awk 捕获第 3 列,然后对其进行排序,最后打印输出,但我到这里只是为了捕获该列。我没有成功排序,也没有打印出不想要的输出。
这是我目前得到的代码...
cat h1.csv | awk -F';' '{ gsub(" ","\n",); print [=10=] }'
我试过了(还有一些人给出了错误):
cat h1.csv | awk -F';' '{ gsub(" ","\n",); print | "sort -u" }'
cat h1.csv | awk -F';' '{ gsub(" ","\n",); sort -u; print }'
所以,是否可以这样做,怎么做?,任何帮助!谢谢...
一个选项可能是在 space 上拆分第 3 列,然后使用 asort()
作为使用 gnu-awk
的值。
然后再次连接前 2 个字段和拆分排序的字段。
awk '
BEGIN{FS=OFS=";"}
{
n=split(, a, " ")
asort(a)
res = OFS OFS
for (i = 1; i <= n; i++) {
res = res " " a[i]
}
print res
}' file
输出
Class S101-T1;3343-1-25310; 3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310; 3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310; 3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310; 3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310; 3344-5-25446 3345-5-25691
在 GNU awk
中,使用您显示的示例,请尝试以下 awk
代码。
awk '
BEGIN{
FS=OFS=";"
PROCINFO["sorted_in"] = "@val_num_asc"
}
{
nf=val=""
delete value
num=split($NF,arr," ")
for(i=1;i<=num;i++){
split(arr[i],arr2,"-")
value[arr2[1]]=arr[i]
}
for(i in value){
nf=(nf?nf " ":"")value[i]
}
$NF=nf
}
1
' Input_file
解释:为以上添加详细解释。
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS=";" ##Setting FS, OFS as ; here.
PROCINFO["sorted_in"] = "@val_num_asc" ##Setting PROCINFO using sorted_in to make sure array values are sorted by values in ascending order only.
}
{
nf=val="" ##Nullifying variables here.
delete value ##Deleting value array here.
num=split($NF,arr," ") ##Splitting last field into arr with separator as space here.
for(i=1;i<=num;i++){ ##Traversing through all elements of array arr.
split(arr[i],arr2,"-") ##Splitting first value of arr into arr2 by delimiter of - to make sure to get only first value eg: 3344, 3345 etc.
value[arr2[1]]=arr[i] ##Assigning value array value to arr value with index of arr2 value whose index of 1st.
}
for(i in value){ ##Traversing through array value here.
nf=(nf?nf " ":"")value[i] ##Concatenating all values to nf here.
}
$NF=nf ##Assigning last field value to nf here.
}
1 ##printing edited/non-edited line here.
' Input_file ##Mentioning Input_file name here.
使用 GNU awk sorted_in
:
$ cat tst.awk
BEGIN {
FS = OFS = ";"
PROCINFO["sorted_in"] = "@val_str_asc"
}
{
split(,a," ")
sorted = ""
for (i in a) {
sorted = (sorted=="" ? "" : sorted " ") a[i]
}
= sorted
print
}
$ awk -f tst.awk file
Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310;3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691
请注意,这假定按字母顺序排序,因此它会在 200-1-1
之前排序 1000-1-1
。只要您要排序的字符串始终由相同长度的部分组成,即 4digits-1digit-5digits,这就有效。