根据一个值对 12 行的组进行排序
Sort groups of 12 lines based on one value
我正在尝试优化对包含 50 万行数据的列表中排名最高的多项式 (https://maths-people.anu.edu.au/~brent/pd/Murphy-thesis.pdf) 的搜索。该列表以 12 行为一组,每行采用以下格式:
n: 533439167600904850230361756102700151678687933392166847323827307497363839257031077774321424872955045754669625577486179222154434651598903112919949771321416511589029559325246084363632977829645558547714072241
Y0: -2185827644152440194843077528225522129878
Y1: 119181810251841490251547
c0: 520196368294236390929241313007470334962
c1: 96360506527052960901419060941213412645
c2: 43791634664623702231347384357
c3: -9285559657533242039560613517
c4: 563452403603161952
c5: -21637936320
skew: 137792.000
lognorm 67.52, exp_E 62.03, alpha -1.81 (proj -2.68), 3 real roots
n: 533439167600904850230361756102700151678687933392166847323827307497363839257031077774321424872955045754669625577486179222154434651598903112919949771321416511589029559325246084363632977829645558547714072241
Y0: -2185827643535814056463203098120423438934
Y1: 1185320029877707674463
c0: 2018231558989478149929124495499518870153
c1: 877408379299126273318698618329767851376
c2: -103500370253681428439107986294
c3: -8603519648746439934492486528
c4: 220583232537944759
c5: -12839506680
skew: 431744.000
lognorm 68.01, exp_E 62.61, alpha 0.09 (proj -1.93), 3 real roots
我如何才能根据给定参数的值对这些进行排序? (lognorm 或 exp_E)
如果没有“帮助”,我认为 sort 命令不会执行您想要的操作。
所以,
- 将所有 12 行合并为一个超字符串
- 在字符串前面加上两个排序字段
- 按需要排序
- 转换回原始格式
以下不是最高效的脚本,但应该相当容易理解
# combine 12 lines into one super string
# preceed each line with the two potential sort fields
gawk '
BEGIN{del="^"}
[=10=]==""{next} ## skip blank line
{all=all [=10=] del} ## build up combo string
/lognorm/{
L=
E=
sub(",","",L)
sub(",","",L)
print L,E,all ## copy two potential sort fields to fron of the string
all=""
}' |
sort -n -k1,1 | ## or -k2,2 ### now we sort on desired field
gawk '{
gsub(/[\^]/, "\n") # replace ^ with newline
sub(/^[^ ]* [^ ]* /, "") # strip first two fields (we added above)
print [=10=]
}'
我正在尝试优化对包含 50 万行数据的列表中排名最高的多项式 (https://maths-people.anu.edu.au/~brent/pd/Murphy-thesis.pdf) 的搜索。该列表以 12 行为一组,每行采用以下格式:
n: 533439167600904850230361756102700151678687933392166847323827307497363839257031077774321424872955045754669625577486179222154434651598903112919949771321416511589029559325246084363632977829645558547714072241
Y0: -2185827644152440194843077528225522129878
Y1: 119181810251841490251547
c0: 520196368294236390929241313007470334962
c1: 96360506527052960901419060941213412645
c2: 43791634664623702231347384357
c3: -9285559657533242039560613517
c4: 563452403603161952
c5: -21637936320
skew: 137792.000
lognorm 67.52, exp_E 62.03, alpha -1.81 (proj -2.68), 3 real roots
n: 533439167600904850230361756102700151678687933392166847323827307497363839257031077774321424872955045754669625577486179222154434651598903112919949771321416511589029559325246084363632977829645558547714072241
Y0: -2185827643535814056463203098120423438934
Y1: 1185320029877707674463
c0: 2018231558989478149929124495499518870153
c1: 877408379299126273318698618329767851376
c2: -103500370253681428439107986294
c3: -8603519648746439934492486528
c4: 220583232537944759
c5: -12839506680
skew: 431744.000
lognorm 68.01, exp_E 62.61, alpha 0.09 (proj -1.93), 3 real roots
我如何才能根据给定参数的值对这些进行排序? (lognorm 或 exp_E)
如果没有“帮助”,我认为 sort 命令不会执行您想要的操作。
所以,
- 将所有 12 行合并为一个超字符串
- 在字符串前面加上两个排序字段
- 按需要排序
- 转换回原始格式
以下不是最高效的脚本,但应该相当容易理解
# combine 12 lines into one super string
# preceed each line with the two potential sort fields
gawk '
BEGIN{del="^"}
[=10=]==""{next} ## skip blank line
{all=all [=10=] del} ## build up combo string
/lognorm/{
L=
E=
sub(",","",L)
sub(",","",L)
print L,E,all ## copy two potential sort fields to fron of the string
all=""
}' |
sort -n -k1,1 | ## or -k2,2 ### now we sort on desired field
gawk '{
gsub(/[\^]/, "\n") # replace ^ with newline
sub(/^[^ ]* [^ ]* /, "") # strip first two fields (we added above)
print [=10=]
}'