awk 拆分和计算字段中的数字

awk to split and count number in field

在下面的 awk 中,我试图跳过 header,在 , 中提取每个 : 和 之间的数字,然后打印 </code> 和计数<code>,并将 header 放回输出中。我当前的输出似乎是复制每一行并按原样打印该行。输入的每行中可能有空列,但它始终为 tab-delimited。谢谢 :).

awk

awk 'BEGIN{FS=OFS="\t"}; NR>1 {gsub(/:,/,"",); {count[]++} print ,$count} FNR>1' file

也试过:

awk -F'\t' '{gsub(/:,/,"",); {count[]++}
 END{print "id","string"; 
  print ,count}}' file | column -t

文件 tab-delimited

id string
a1a B:80,V:2,Z:0
b2b B:100,V:1,Z:3

当前 tab-delimited

a1a
a1a B:80,V:2,Z:0
b2b
b2b B:100,V:1,Z:3

需要 tab-delimited

id sting
a1a 82
b2b 104

你可以使用这个awk:

awk '
BEGIN{FS=OFS="\t"}
NR == 1 {
   print
   next
}
n = split(, a, /,/) {
   s = 0
   for (i=1; i<=n; ++i) {
      sub(/[^:]*:/, "", a[i])
      s += a[i]+0
   }
   print , s
}' file

id  string
a1a 82
b2b 104

使用您展示的示例,请尝试以下 awk 代码。

awk '
BEGIN { FS=OFS="\t" }
FNR==1{
  print
  next
}
{
  sum=0
  num=split(,arr,"[:,]")
  for(i=2;i<=num;i+=2){
    sum+=arr[i]
  }
  print ,sum
}
'  Input_file

您显示的示例的输出如下:

id string
a1a 82
b2b 104

说明:为以上代码添加详细说明。

awk '                         ##Starting awk program from here.
BEGIN { FS=OFS="\t" }         ##In BEGIN section setting FS and OFS to \t here.
FNR==1{                       ##checking if this is first line then do following.
  print                       ##Printing current line here.
  next                        ##next will skip all further statements from here.
}
{
  sum=0                       ##Nullifying sum here.
  num=split(,arr,"[:,]")    ##Splitting 2nd field into array arr with delimiter of : ;
  for(i=2;i<=num;i+=2){       ##Running for loop from i=2 to till NF with difference of 2
    sum+=arr[i]               ##Adding arr[i] value to sum and keep adding it.
  }
  print ,sum                ##Printing  and sum here.
}
' Input_file                  ##Mentioning Input_file name here.
$ awk '
    BEGIN { FS=OFS="\t" }
    NR>1 {
        n = split(,a,/[:,]/)
        sum = 0
        for ( i=2; i<=n; i+=2 ) {
            sum += a[i]
        }
         = sum
    }
    { print }
' file
id      string
a1a     82
b2b     104
awk -F'[ :,]' 'NR==1{print}NF==7{print,++}' input_file|column -t
id   string
a1a  82
b2b  104

一种既不需要数组也不需要循环的awk解决方案:

< input_file.txt | 

{m,g}awk 'BEGIN { FS = "[,]?["(OFS = "\t")" ]*([A-Z][:])?"
         _+=++_ } NR<_ || NF=_^($_+= $(_+!!_) + $NF )^!_'


id  string
a1a 82
b2b 104

想法是使用 FS 尽可能多地收集,在过滤逻辑之前留下看起来像这样的字段:

id    string            # <intermediary view>
a1a   80       2    0
b2b   100      1    3

然后只需将 </code> 和 <code> 添加回 </code></p> <p>只为<code>gawk,甚至可以做到::

gawk '
BEGIN {
    FS = "[,]?[" (OFS = "\t") " ]*([A-Z][:])?"
    print $(_ * (getline))
} $NF += $NF + $--NF + ! --NF '

id  string
a1a 82
b2b 104

mawk 相当于:

  • (差异在于他们每个人如何跟踪增量和减量到 NF
mawk 'BEGIN { FS="[,]?["(OFS="\t")" ]*([A-Z][:])?"
      print \
           $(getline*!(_+=++_)) } $_+=$NF+$--NF+!—NF'

id  string
a1a 82
b2b 104