awk 拆分和计算字段中的数字
awk to split and count number in field
在下面的 awk
中,我试图跳过 header,在 ,
中提取每个 : 和 之间的数字,然后打印 </code> 和计数<code>
,并将 header 放回输出中。我当前的输出似乎是复制每一行并按原样打印该行。输入的每行中可能有空列,但它始终为 tab-delimited
。谢谢 :).
awk
awk 'BEGIN{FS=OFS="\t"}; NR>1 {gsub(/:,/,"",); {count[]++} print ,$count} FNR>1' file
也试过:
awk -F'\t' '{gsub(/:,/,"",); {count[]++}
END{print "id","string";
print ,count}}' file | column -t
文件 tab-delimited
id string
a1a B:80,V:2,Z:0
b2b B:100,V:1,Z:3
当前 tab-delimited
a1a
a1a B:80,V:2,Z:0
b2b
b2b B:100,V:1,Z:3
需要 tab-delimited
id sting
a1a 82
b2b 104
你可以使用这个awk
:
awk '
BEGIN{FS=OFS="\t"}
NR == 1 {
print
next
}
n = split(, a, /,/) {
s = 0
for (i=1; i<=n; ++i) {
sub(/[^:]*:/, "", a[i])
s += a[i]+0
}
print , s
}' file
id string
a1a 82
b2b 104
使用您展示的示例,请尝试以下 awk
代码。
awk '
BEGIN { FS=OFS="\t" }
FNR==1{
print
next
}
{
sum=0
num=split(,arr,"[:,]")
for(i=2;i<=num;i+=2){
sum+=arr[i]
}
print ,sum
}
' Input_file
您显示的示例的输出如下:
id string
a1a 82
b2b 104
说明:为以上代码添加详细说明。
awk ' ##Starting awk program from here.
BEGIN { FS=OFS="\t" } ##In BEGIN section setting FS and OFS to \t here.
FNR==1{ ##checking if this is first line then do following.
print ##Printing current line here.
next ##next will skip all further statements from here.
}
{
sum=0 ##Nullifying sum here.
num=split(,arr,"[:,]") ##Splitting 2nd field into array arr with delimiter of : ;
for(i=2;i<=num;i+=2){ ##Running for loop from i=2 to till NF with difference of 2
sum+=arr[i] ##Adding arr[i] value to sum and keep adding it.
}
print ,sum ##Printing and sum here.
}
' Input_file ##Mentioning Input_file name here.
$ awk '
BEGIN { FS=OFS="\t" }
NR>1 {
n = split(,a,/[:,]/)
sum = 0
for ( i=2; i<=n; i+=2 ) {
sum += a[i]
}
= sum
}
{ print }
' file
id string
a1a 82
b2b 104
awk -F'[ :,]' 'NR==1{print}NF==7{print,++}' input_file|column -t
id string
a1a 82
b2b 104
一种既不需要数组也不需要循环的awk
解决方案:
< input_file.txt |
{m,g}awk 'BEGIN { FS = "[,]?["(OFS = "\t")" ]*([A-Z][:])?"
_+=++_ } NR<_ || NF=_^($_+= $(_+!!_) + $NF )^!_'
id string
a1a 82
b2b 104
想法是使用 FS
尽可能多地收集,在过滤逻辑之前留下看起来像这样的字段:
id string # <intermediary view>
a1a 80 2 0
b2b 100 1 3
然后只需将 </code> 和 <code>
添加回 </code></p>
<p>只为<code>gawk
,甚至可以做到::
gawk '
BEGIN {
FS = "[,]?[" (OFS = "\t") " ]*([A-Z][:])?"
print $(_ * (getline))
} $NF += $NF + $--NF + ! --NF '
id string
a1a 82
b2b 104
mawk
相当于:
- (差异在于他们每个人如何跟踪增量和减量到
NF
)
mawk 'BEGIN { FS="[,]?["(OFS="\t")" ]*([A-Z][:])?"
print \
$(getline*!(_+=++_)) } $_+=$NF+$--NF+!—NF'
id string
a1a 82
b2b 104
在下面的 awk
中,我试图跳过 header,在 ,
中提取每个 : 和 之间的数字,然后打印 </code> 和计数<code>
,并将 header 放回输出中。我当前的输出似乎是复制每一行并按原样打印该行。输入的每行中可能有空列,但它始终为 tab-delimited
。谢谢 :).
awk
awk 'BEGIN{FS=OFS="\t"}; NR>1 {gsub(/:,/,"",); {count[]++} print ,$count} FNR>1' file
也试过:
awk -F'\t' '{gsub(/:,/,"",); {count[]++}
END{print "id","string";
print ,count}}' file | column -t
文件 tab-delimited
id string
a1a B:80,V:2,Z:0
b2b B:100,V:1,Z:3
当前 tab-delimited
a1a
a1a B:80,V:2,Z:0
b2b
b2b B:100,V:1,Z:3
需要 tab-delimited
id sting
a1a 82
b2b 104
你可以使用这个awk
:
awk '
BEGIN{FS=OFS="\t"}
NR == 1 {
print
next
}
n = split(, a, /,/) {
s = 0
for (i=1; i<=n; ++i) {
sub(/[^:]*:/, "", a[i])
s += a[i]+0
}
print , s
}' file
id string
a1a 82
b2b 104
使用您展示的示例,请尝试以下 awk
代码。
awk '
BEGIN { FS=OFS="\t" }
FNR==1{
print
next
}
{
sum=0
num=split(,arr,"[:,]")
for(i=2;i<=num;i+=2){
sum+=arr[i]
}
print ,sum
}
' Input_file
您显示的示例的输出如下:
id string
a1a 82
b2b 104
说明:为以上代码添加详细说明。
awk ' ##Starting awk program from here.
BEGIN { FS=OFS="\t" } ##In BEGIN section setting FS and OFS to \t here.
FNR==1{ ##checking if this is first line then do following.
print ##Printing current line here.
next ##next will skip all further statements from here.
}
{
sum=0 ##Nullifying sum here.
num=split(,arr,"[:,]") ##Splitting 2nd field into array arr with delimiter of : ;
for(i=2;i<=num;i+=2){ ##Running for loop from i=2 to till NF with difference of 2
sum+=arr[i] ##Adding arr[i] value to sum and keep adding it.
}
print ,sum ##Printing and sum here.
}
' Input_file ##Mentioning Input_file name here.
$ awk '
BEGIN { FS=OFS="\t" }
NR>1 {
n = split(,a,/[:,]/)
sum = 0
for ( i=2; i<=n; i+=2 ) {
sum += a[i]
}
= sum
}
{ print }
' file
id string
a1a 82
b2b 104
awk -F'[ :,]' 'NR==1{print}NF==7{print,++}' input_file|column -t
id string
a1a 82
b2b 104
一种既不需要数组也不需要循环的awk
解决方案:
< input_file.txt |
{m,g}awk 'BEGIN { FS = "[,]?["(OFS = "\t")" ]*([A-Z][:])?"
_+=++_ } NR<_ || NF=_^($_+= $(_+!!_) + $NF )^!_'
id string
a1a 82
b2b 104
想法是使用 FS
尽可能多地收集,在过滤逻辑之前留下看起来像这样的字段:
id string # <intermediary view>
a1a 80 2 0
b2b 100 1 3
然后只需将 </code> 和 <code>
添加回 </code></p>
<p>只为<code>gawk
,甚至可以做到::
gawk '
BEGIN {
FS = "[,]?[" (OFS = "\t") " ]*([A-Z][:])?"
print $(_ * (getline))
} $NF += $NF + $--NF + ! --NF '
id string
a1a 82
b2b 104
mawk
相当于:
- (差异在于他们每个人如何跟踪增量和减量到
NF
)
mawk 'BEGIN { FS="[,]?["(OFS="\t")" ]*([A-Z][:])?"
print \
$(getline*!(_+=++_)) } $_+=$NF+$--NF+!—NF'
id string
a1a 82
b2b 104