按尾随数字后缀排序的一行
One-liner to sort by trailing number suffix
我有一系列语法如下的行:
sweets apple:11 banana:9 cherry:101 donut:1 egg tart:86
tossed added:5 anted:13 ashley:3 bandied:3 flung:6 lobbed:4 salad:26 slung:9
plenty abundance:3 a lot:83 ample:12 aroar:3 a ton:12 enow:5 gobs:5 lots:27 lotsa:8
(the large spaces are all tabs)
期望的输出是第 2+ 列按冒号后的数字排序。例如
sweets cherry:101 egg tart:86 apple:11 banana:9 donut:1
tossed salad:26 anted:13 slung:9 flung:6 added:5 lobbed:4 ashley:3 bandied:3
plenty a lot:83 lots:27 a ton:12 ample:12 lotsa:8 abundance:3 enow:5 gobs:5 aroar:3
我经常使用 ruby 单行..
//alphabetize within a line, delimited by pipes "|"
ruby -pe '$_=$_.strip.split("|").sort().join("|")+"\n"'
//case insensitive with no dupes:
ruby -pe '$_=$_.strip.split("|").sort_by{|x| x.downcase }.uniq.join("|")+"\n"'
//keep the first term:
ruby -pe '$_=$_.split(":")[0].strip+":"+$_.split(":")[1].strip.split("|").sort.join("|")+"\n"'
但我想不出一种简单明了的方法来按尾随数字排序。即“:NN”。我相信这可以用几个字符来完成。如何?我也很高兴有一个 awk 解决方案,但是 ruby 对于更复杂的处理通常更干净。
假设 a
是将每行拆分为 \t
个字符的结果。
irb(main):009:0> "#{a[0]}\t#{a[1..].sort { |a, b| b.split(":")[1].to_i <=> a.split(":")[1].to_i }.join("\t")}"
=> "headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"
每行都在制表符上拆分。这给了我们一个数组:
["headword", "apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]
我们可以不理会第一个元素。然后,我们可以通过将剩余元素分成 key/value 对并比较每个元素的第二个元素来对剩余元素进行排序。如果我们比较 b
和 a
我们得到降序。
ruby -pe 'a=$_.split("\t");puts "#{a[0]}\t#{a[1..].sort{|a,b|b.split(":")[1].to_i<=>a.split(":")[1].to_i}.join("\t")}"'
> str = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
=> "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
> (x = str.split("\t"))[1..-1].sort_by { |x| x.split(':')[-1].to_i }.reverse.prepend(x[0]).join("\t")
=> "headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"
str = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
hw, *arr = str.split("\t")
hw
#=> "headword"
arr
#=> ["apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]
[hw, *arr.sort_by { |s| -s[/(?<=:)\d+/].to_i }].join("\t")
#=>"headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"
鉴于:
cat file
headword apple:11 zanana:9 cherry:101 donut:1 egg tart:86
在Ruby,我会做:
ruby -F"\t" -lane 'puts $F.sort_by{ |w|
idx=w[/(?<=:)\d+/]
if (idx.nil?)
-1/0.0
else
-idx.to_i
end
}.join("\t")' file
或者如果我们知道第一个词不需要排序,其余的都有数字,你可以这样做:
ruby -F"\t" -lane 'hw, *arr = $F; puts "#{hw}\t#{arr.sort_by{ |w| -w[/(?<=:)\d+/].to_i }.join("\t")}"' file
或者在 GNU awk 中,你可以这样做:
awk 'BEGIN{OFS="\t"}
function byn(i1, v1, i2, v2, l, r)
{
if (index(v1,":")==0 || index(v2,":")==0) return -1
split(v1,va1,/:/)
split(v2,va2,/:/)
if (va1[2]>va2[2])
return -1
else if (va1[2]==va2[2])
return 0
else
return 1
}
{split([=13=], fields, /\t+/)
asort(fields, result, "byn")
for (i=1; i<=length(result); i++)
printf "%s%s", result[i], i==length(result) ? ORS : OFS}' file
全部三个打印:
headword cherry:101 egg tart:86 apple:11 zanana:9 donut:1
不是 one-liner,但我的观点是:
line = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
fields = line.split(/\t/)
result = [fields[0]]
.concat(
fields[1..-1]
.map {|each| each.split(":")}
.sort {|a, b| b[1].to_i <=> a[1].to_i}
.map {|each| each.join(":")}
)
.join "\t"
使用 GNU awk sorted_in
:
$ cat tst.awk
BEGIN {
FS=OFS="\t"
PROCINFO["sorted_in"] = "@val_num_desc"
}
{
for (i=2; i<=NF; i++) {
split($i,t,":")
nums[i] = t[2]
}
out =
for (i in nums) {
out = out OFS $i
}
print out
}
$ awk -f tst.awk file
sweets cherry:101 egg tart:86 apple:11 banana:9 donut:1
tossed salad:26 anted:13 slung:9 flung:6 added:5 lobbed:4 bandied:3 ashley:3
plenty a lot:83 lots:27 a ton:12 ample:12 lotsa:8 enow:5 gobs:5 aroar:3 abundance:3
如果你真的觉得出于某种原因把它全部塞进“one-liner”很有用,那么你当然可以:
$ awk -F'\t' 'BEGIN{PROCINFO["sorted_in"]="@val_num_desc"} {for(i=2;i<=NF;i++){split($i,t,":");n[i]=t[2]}o=;for(i in n)o=o FS $i;print o}' file
sweets cherry:101 egg tart:86 apple:11 banana:9 donut:1
tossed salad:26 anted:13 slung:9 flung:6 added:5 lobbed:4 bandied:3 ashley:3
plenty a lot:83 lots:27 a ton:12 ample:12 lotsa:8 enow:5 gobs:5 aroar:3 abundance:3
但它失去了一点清晰度。
我有一系列语法如下的行:
sweets apple:11 banana:9 cherry:101 donut:1 egg tart:86
tossed added:5 anted:13 ashley:3 bandied:3 flung:6 lobbed:4 salad:26 slung:9
plenty abundance:3 a lot:83 ample:12 aroar:3 a ton:12 enow:5 gobs:5 lots:27 lotsa:8
(the large spaces are all tabs)
期望的输出是第 2+ 列按冒号后的数字排序。例如
sweets cherry:101 egg tart:86 apple:11 banana:9 donut:1
tossed salad:26 anted:13 slung:9 flung:6 added:5 lobbed:4 ashley:3 bandied:3
plenty a lot:83 lots:27 a ton:12 ample:12 lotsa:8 abundance:3 enow:5 gobs:5 aroar:3
我经常使用 ruby 单行..
//alphabetize within a line, delimited by pipes "|"
ruby -pe '$_=$_.strip.split("|").sort().join("|")+"\n"'
//case insensitive with no dupes:
ruby -pe '$_=$_.strip.split("|").sort_by{|x| x.downcase }.uniq.join("|")+"\n"'
//keep the first term:
ruby -pe '$_=$_.split(":")[0].strip+":"+$_.split(":")[1].strip.split("|").sort.join("|")+"\n"'
但我想不出一种简单明了的方法来按尾随数字排序。即“:NN”。我相信这可以用几个字符来完成。如何?我也很高兴有一个 awk 解决方案,但是 ruby 对于更复杂的处理通常更干净。
假设 a
是将每行拆分为 \t
个字符的结果。
irb(main):009:0> "#{a[0]}\t#{a[1..].sort { |a, b| b.split(":")[1].to_i <=> a.split(":")[1].to_i }.join("\t")}"
=> "headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"
每行都在制表符上拆分。这给了我们一个数组:
["headword", "apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]
我们可以不理会第一个元素。然后,我们可以通过将剩余元素分成 key/value 对并比较每个元素的第二个元素来对剩余元素进行排序。如果我们比较 b
和 a
我们得到降序。
ruby -pe 'a=$_.split("\t");puts "#{a[0]}\t#{a[1..].sort{|a,b|b.split(":")[1].to_i<=>a.split(":")[1].to_i}.join("\t")}"'
> str = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
=> "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
> (x = str.split("\t"))[1..-1].sort_by { |x| x.split(':')[-1].to_i }.reverse.prepend(x[0]).join("\t")
=> "headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"
str = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
hw, *arr = str.split("\t")
hw
#=> "headword"
arr
#=> ["apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]
[hw, *arr.sort_by { |s| -s[/(?<=:)\d+/].to_i }].join("\t")
#=>"headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"
鉴于:
cat file
headword apple:11 zanana:9 cherry:101 donut:1 egg tart:86
在Ruby,我会做:
ruby -F"\t" -lane 'puts $F.sort_by{ |w|
idx=w[/(?<=:)\d+/]
if (idx.nil?)
-1/0.0
else
-idx.to_i
end
}.join("\t")' file
或者如果我们知道第一个词不需要排序,其余的都有数字,你可以这样做:
ruby -F"\t" -lane 'hw, *arr = $F; puts "#{hw}\t#{arr.sort_by{ |w| -w[/(?<=:)\d+/].to_i }.join("\t")}"' file
或者在 GNU awk 中,你可以这样做:
awk 'BEGIN{OFS="\t"}
function byn(i1, v1, i2, v2, l, r)
{
if (index(v1,":")==0 || index(v2,":")==0) return -1
split(v1,va1,/:/)
split(v2,va2,/:/)
if (va1[2]>va2[2])
return -1
else if (va1[2]==va2[2])
return 0
else
return 1
}
{split([=13=], fields, /\t+/)
asort(fields, result, "byn")
for (i=1; i<=length(result); i++)
printf "%s%s", result[i], i==length(result) ? ORS : OFS}' file
全部三个打印:
headword cherry:101 egg tart:86 apple:11 zanana:9 donut:1
不是 one-liner,但我的观点是:
line = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
fields = line.split(/\t/)
result = [fields[0]]
.concat(
fields[1..-1]
.map {|each| each.split(":")}
.sort {|a, b| b[1].to_i <=> a[1].to_i}
.map {|each| each.join(":")}
)
.join "\t"
使用 GNU awk sorted_in
:
$ cat tst.awk
BEGIN {
FS=OFS="\t"
PROCINFO["sorted_in"] = "@val_num_desc"
}
{
for (i=2; i<=NF; i++) {
split($i,t,":")
nums[i] = t[2]
}
out =
for (i in nums) {
out = out OFS $i
}
print out
}
$ awk -f tst.awk file
sweets cherry:101 egg tart:86 apple:11 banana:9 donut:1
tossed salad:26 anted:13 slung:9 flung:6 added:5 lobbed:4 bandied:3 ashley:3
plenty a lot:83 lots:27 a ton:12 ample:12 lotsa:8 enow:5 gobs:5 aroar:3 abundance:3
如果你真的觉得出于某种原因把它全部塞进“one-liner”很有用,那么你当然可以:
$ awk -F'\t' 'BEGIN{PROCINFO["sorted_in"]="@val_num_desc"} {for(i=2;i<=NF;i++){split($i,t,":");n[i]=t[2]}o=;for(i in n)o=o FS $i;print o}' file
sweets cherry:101 egg tart:86 apple:11 banana:9 donut:1
tossed salad:26 anted:13 slung:9 flung:6 added:5 lobbed:4 bandied:3 ashley:3
plenty a lot:83 lots:27 a ton:12 ample:12 lotsa:8 enow:5 gobs:5 aroar:3 abundance:3
但它失去了一点清晰度。