按列分隔 gawk header

separate gawk header by column

我正在尝试使用 gawk 将 header 分成 3 个字段,但似乎无法获得所需的结果:</code> 是目标列,<code>是基因|GC 列,</code> 是平均值列。</p> <p><strong>gawk</strong></p> <pre><code>gawk '{sub(/-[0-9]+/,"",); ar[]=[=11=]} END{n = asort(ar) printf "%-8s%8s%8s\n", "Target", "Gene|GC", "Average Depth" for (i = 1; i <= n; i++) print ar[i]}' OFS='\t' file

输入

 chr2:198299650-198299769 SF3B1-823|gc=51.3 143.1
 chr17:42153038-42153421 G6PC3-1981|gc=61.6 406.7
 chr13:32903545-32903664 BRCA2-318|gc=27.7 39.6
 chr17:56811469-56811593 RAD51C-2465|gc=44.4 228.5   

当前输出

TargetGene|GCAverage Depth
chr10:79793602-79793721 RPS24|gc=59.7   150.3
chr10:79795083-79795202 RPS24|gc=41.2   111.4
chr10:79797665-79797784 RPS24|gc=37 69.8
chr10:79799902-79800021 RPS24|gc=39.5   134.5

期望输出

Target                  Gene|GC         Average Depth
chr10:79793602-79793721 RPS24|gc=59.7   150.3
chr10:79795083-79795202 RPS24|gc=41.2   111.4
chr10:79797665-79797784 RPS24|gc=37 69.8

看来我只需要:

gawk  '{sub(/-[0-9]+/,"",); ar[]=[=10=]}
        END{n = asort(ar)
                 print "Target","Gene|GC","Average Depth"
            for (i = 1; i <= n; i++)
                 print ar[i]}' OFS='\t' file

不确定这是否是最好的方法,但输出很好。谢谢 :).