按尾随数字后缀排序的一行

One-liner to sort by trailing number suffix

我有一系列语法如下的行:

sweets   apple:11   banana:9   cherry:101   donut:1   egg tart:86   
tossed   added:5   anted:13   ashley:3   bandied:3   flung:6   lobbed:4   salad:26   slung:9
plenty   abundance:3   a lot:83   ample:12   aroar:3   a ton:12   enow:5   gobs:5   lots:27   lotsa:8   
 
(the large spaces are all tabs)

期望的输出是第 2+ 列按冒号后的数字排序。例如

sweets   cherry:101   egg tart:86   apple:11   banana:9   donut:1   
tossed   salad:26   anted:13   slung:9   flung:6   added:5   lobbed:4   ashley:3   bandied:3
plenty   a lot:83   lots:27   a ton:12   ample:12   lotsa:8   abundance:3   enow:5   gobs:5   aroar:3   

我经常使用 ruby 单行..

//alphabetize within a line, delimited by pipes "|" 
ruby -pe '$_=$_.strip.split("|").sort().join("|")+"\n"'

//case insensitive with no dupes:
ruby -pe '$_=$_.strip.split("|").sort_by{|x| x.downcase }.uniq.join("|")+"\n"' 

//keep the first term:
ruby -pe '$_=$_.split(":")[0].strip+":"+$_.split(":")[1].strip.split("|").sort.join("|")+"\n"'

但我想不出一种简单明了的方法来按尾随数字排序。即“:NN”。我相信这可以用几个字符来完成。如何?我也很高兴有一个 awk 解决方案,但是 ruby 对于更复杂的处理通常更干净。

假设 a 是将每行拆分为 \t 个字符的结果。

irb(main):009:0> "#{a[0]}\t#{a[1..].sort { |a, b| b.split(":")[1].to_i <=> a.split(":")[1].to_i }.join("\t")}"
=> "headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"

每行都在制表符上拆分。这给了我们一个数组:

["headword", "apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]

我们可以不理会第一个元素。然后,我们可以通过将剩余元素分成 key/value 对并比较每个元素的第二个元素来对剩余元素进行排序。如果我们比较 ba 我们得到降序。

ruby -pe 'a=$_.split("\t");puts "#{a[0]}\t#{a[1..].sort{|a,b|b.split(":")[1].to_i<=>a.split(":")[1].to_i}.join("\t")}"'
> str = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
=> "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
> (x = str.split("\t"))[1..-1].sort_by { |x| x.split(':')[-1].to_i }.reverse.prepend(x[0]).join("\t")
=> "headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"
str = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
hw, *arr = str.split("\t")
hw
  #=> "headword"
arr
  #=> ["apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]
[hw, *arr.sort_by { |s| -s[/(?<=:)\d+/].to_i }].join("\t")
  #=>"headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"

鉴于:

cat file
headword    apple:11    zanana:9    cherry:101  donut:1 egg tart:86

在Ruby,我会做:

ruby -F"\t" -lane  'puts $F.sort_by{ |w|  
    idx=w[/(?<=:)\d+/]
    if (idx.nil?)
        -1/0.0
    else 
        -idx.to_i
    end
}.join("\t")' file

或者如果我们知道第一个词不需要排序,其余的都有数字,你可以这样做:

ruby -F"\t" -lane 'hw, *arr = $F; puts "#{hw}\t#{arr.sort_by{ |w| -w[/(?<=:)\d+/].to_i }.join("\t")}"' file 

或者在 GNU awk 中,你可以这样做:

awk 'BEGIN{OFS="\t"}
function byn(i1, v1, i2, v2,    l, r)
{
    if (index(v1,":")==0 || index(v2,":")==0) return -1
    split(v1,va1,/:/)
    split(v2,va2,/:/)

    if (va1[2]>va2[2])
        return -1
    else if (va1[2]==va2[2])
        return 0
    else
        return 1
}
{split([=13=], fields, /\t+/)
asort(fields, result, "byn")
for (i=1; i<=length(result); i++) 
    printf "%s%s", result[i], i==length(result) ? ORS : OFS}' file

全部三个打印:

headword    cherry:101  egg tart:86 apple:11    zanana:9    donut:1

不是 one-liner,但我的观点是:

line = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
fields = line.split(/\t/)
result =  [fields[0]]
            .concat(
              fields[1..-1]
                .map {|each| each.split(":")}
                .sort {|a, b| b[1].to_i <=> a[1].to_i}
                .map {|each| each.join(":")}
            )
            .join "\t"

使用 GNU awk sorted_in:

$ cat tst.awk
BEGIN {
    FS=OFS="\t"
    PROCINFO["sorted_in"] = "@val_num_desc"
}
{
    for (i=2; i<=NF; i++) {
        split($i,t,":")
        nums[i] = t[2]
    }
    out = 
    for (i in nums) {
        out = out OFS $i
    }
    print out
}

$ awk -f tst.awk file
sweets  cherry:101      egg tart:86     apple:11        banana:9        donut:1
tossed  salad:26        anted:13        slung:9 flung:6 added:5 lobbed:4       bandied:3        ashley:3
plenty  a lot:83        lots:27 a ton:12        ample:12        lotsa:8 enow:5 gobs:5   aroar:3 abundance:3

如果你真的觉得出于某种原因把它全部塞进“one-liner”很有用,那么你当然可以:

$ awk -F'\t' 'BEGIN{PROCINFO["sorted_in"]="@val_num_desc"} {for(i=2;i<=NF;i++){split($i,t,":");n[i]=t[2]}o=;for(i in n)o=o FS $i;print o}' file
sweets  cherry:101      egg tart:86     apple:11        banana:9        donut:1
tossed  salad:26        anted:13        slung:9 flung:6 added:5 lobbed:4       bandied:3        ashley:3
plenty  a lot:83        lots:27 a ton:12        ample:12        lotsa:8 enow:5 gobs:5   aroar:3 abundance:3

但它失去了一点清晰度。