在 awk 中重置 NR

Question

cat file.txt

MNS GYPA*N  
MNS GYPA*M  c.59T>C;c.71A>G;c.72G>T
MNS GYPA*Mc c.71G>A;c.72T>G
MNS GYPA*Vw c.140C>T
MNS GYPA*Mg c.68C>A
MNS GYPA*Vr c.197C>A
MNS GYPB*Mta    c.230C>T
MNS GYPB*Ria    c.226G>A
MNS GYPB*Nya    c.138T>A
MNS GYPA*Hut    c.140C>A
.
.
.

第二列值可以以 GYPA、GYPB、GYPC、GYPD、... GYPZ 开头。我想为每个 GYP* 设置一个位置计数并将第三列拆分如下：

1   MNS  GYPA*N
2   MNS GYPA*M  c.59T>C
2   MNS GYPA*M  c.71A>G
2   MNS GYPA*M  c.72G>T
3   MNS GYPA*Mc c.71G>A
3   MNS GYPA*Mc c.72T>G
4   MNS GYPA*Vw .140C>T
5   MNS GYPA*Mg c.68C>A
6   MNS GYPA*Vr c.197C>A
1   MNS GYPB*Mta    c.230C>T
2   MNS GYPB*Ria    c.226G>A
3   MNS GYPB*Nya    c.138T>A
4   MNS GYPB*Hut    c.140C>A
.
.
.

猫format.awk

BEGIN {FS=OFS="\t"}

 ~ /GYPA/
   { num=split(,arr,/;/);
      for (i=1;i<=num;i++)
         { print NR,,,arr[i]}}

 ~ /GYPB/
   { num=split(,arr,/;/);
      for (i=1;i<=num;i++)
         { print NR,,,arr[i]} }
...

我不确定在到达下一个 ~ GYP 时如何重置 NR。 GYP{A..Z}从A到Z依次排列

Answer 1

awk '
{
  match(,/[^*]*/)
  gy_value=substr(,RSTART,RLENGTH)
}
gy_value!=prev_gy_value{
  count=0
}
!arr[]++{
  count++
}
{
  num=split(,array,";")
  for(i=1;i<=num;i++){
    print count,,,array[i]
  }
}
NF<3;
{
  prev_gy_value=gy_value
}
' file.txt

说明：为以上代码添加详细说明。

awk '                                   ##Starting awk program from here.
{
  match(,/[^*]*/)                     ##Using match function to match till * in 2nd field.
  gy_value=substr(,RSTART,RLENGTH)    ##Creating variable gy_value which has sub-string of 2nd field sub-string in it.
}
gy_value!=prev_gy_value{                
  count=0                               ##Creating variable count as 0 here.
}
{
  count++                               ##Increasing value of count with 1 here.
}
{
  num=split(,array,";")               ##Splitting 3rd field into an array with delimiter ; and its count is stored into num variable.
  for(i=1;i<=num;i++){                  ##Starting for loop from i=1 to till value of num here.
    print count,,,array[i]                ##Printing value of , and array with index variable i here.
  }
}
NF<3;                                   ##Checking condition if NF<3 then print the line here.
{
  prev_gy_value=gy_value                ##Setting value of variable gy_value to variable named prev_gy_value here(which is used above code to make sure about values check).
}
'  Input_file                           ##Mentioning Input_file name here.

Answer 2

I am not sure how to reset NR when it reaches the the next ~ GYP. The GYP{A..Z} are in order from A to Z.

无法重置或更改 NR、FNR 或 NF 等内部 awk 变量。这些值由 awk 设置。最简单的方法是跟踪替代 NR 作为可以命名 c 或任何其他名称的变量。该值可以在任何情况下重置为您想要的任何值。

示例：有一个每次在记录中看到 foo 时重置为 1 的计数器：

awk '{c++}([=10=] ~ /foo/){c=1}{print c,[=10=]}'

在 OP 的情况下，可能会使用这样的东西：

awk 'BEGIN{FS=OFS="\t"}
     {c++; key=substr(,1,index(,"*")-1)}
     (key != key_prev) { c=1 }
     { prefix="" }
     (key == "GYPA") { prefix="NM_002099.7:"}
     { num=split(,a,";"); for(i=1;i<=num;++i) print c,,,prefix a[i] }
     { key_prev=key }' file

在 awk 中重置 NR

reset NR in awk

awk

ash