按列分隔 gawk header
separate gawk header by column
我正在尝试使用 gawk
将 header 分成 3 个字段,但似乎无法获得所需的结果:</code> 是目标列,<code>
是基因|GC 列,</code> 是平均值列。</p>
<p><strong>gawk</strong></p>
<pre><code>gawk '{sub(/-[0-9]+/,"",); ar[]=[=11=]}
END{n = asort(ar)
printf "%-8s%8s%8s\n", "Target", "Gene|GC", "Average Depth"
for (i = 1; i <= n; i++)
print ar[i]}' OFS='\t' file
输入
chr2:198299650-198299769 SF3B1-823|gc=51.3 143.1
chr17:42153038-42153421 G6PC3-1981|gc=61.6 406.7
chr13:32903545-32903664 BRCA2-318|gc=27.7 39.6
chr17:56811469-56811593 RAD51C-2465|gc=44.4 228.5
当前输出
TargetGene|GCAverage Depth
chr10:79793602-79793721 RPS24|gc=59.7 150.3
chr10:79795083-79795202 RPS24|gc=41.2 111.4
chr10:79797665-79797784 RPS24|gc=37 69.8
chr10:79799902-79800021 RPS24|gc=39.5 134.5
期望输出
Target Gene|GC Average Depth
chr10:79793602-79793721 RPS24|gc=59.7 150.3
chr10:79795083-79795202 RPS24|gc=41.2 111.4
chr10:79797665-79797784 RPS24|gc=37 69.8
看来我只需要:
gawk '{sub(/-[0-9]+/,"",); ar[]=[=10=]}
END{n = asort(ar)
print "Target","Gene|GC","Average Depth"
for (i = 1; i <= n; i++)
print ar[i]}' OFS='\t' file
不确定这是否是最好的方法,但输出很好。谢谢 :).
我正在尝试使用 gawk
将 header 分成 3 个字段,但似乎无法获得所需的结果:</code> 是目标列,<code>
是基因|GC 列,</code> 是平均值列。</p>
<p><strong>gawk</strong></p>
<pre><code>gawk '{sub(/-[0-9]+/,"",); ar[]=[=11=]}
END{n = asort(ar)
printf "%-8s%8s%8s\n", "Target", "Gene|GC", "Average Depth"
for (i = 1; i <= n; i++)
print ar[i]}' OFS='\t' file
输入
chr2:198299650-198299769 SF3B1-823|gc=51.3 143.1
chr17:42153038-42153421 G6PC3-1981|gc=61.6 406.7
chr13:32903545-32903664 BRCA2-318|gc=27.7 39.6
chr17:56811469-56811593 RAD51C-2465|gc=44.4 228.5
当前输出
TargetGene|GCAverage Depth
chr10:79793602-79793721 RPS24|gc=59.7 150.3
chr10:79795083-79795202 RPS24|gc=41.2 111.4
chr10:79797665-79797784 RPS24|gc=37 69.8
chr10:79799902-79800021 RPS24|gc=39.5 134.5
期望输出
Target Gene|GC Average Depth
chr10:79793602-79793721 RPS24|gc=59.7 150.3
chr10:79795083-79795202 RPS24|gc=41.2 111.4
chr10:79797665-79797784 RPS24|gc=37 69.8
看来我只需要:
gawk '{sub(/-[0-9]+/,"",); ar[]=[=10=]}
END{n = asort(ar)
print "Target","Gene|GC","Average Depth"
for (i = 1; i <= n; i++)
print ar[i]}' OFS='\t' file
不确定这是否是最好的方法,但输出很好。谢谢 :).