在 awk 中重置 NR
reset NR in awk
cat file.txt
MNS GYPA*N
MNS GYPA*M c.59T>C;c.71A>G;c.72G>T
MNS GYPA*Mc c.71G>A;c.72T>G
MNS GYPA*Vw c.140C>T
MNS GYPA*Mg c.68C>A
MNS GYPA*Vr c.197C>A
MNS GYPB*Mta c.230C>T
MNS GYPB*Ria c.226G>A
MNS GYPB*Nya c.138T>A
MNS GYPA*Hut c.140C>A
.
.
.
第二列值可以以 GYPA、GYPB、GYPC、GYPD、... GYPZ 开头。我想为每个 GYP* 设置一个位置计数并将第三列拆分如下:
1 MNS GYPA*N
2 MNS GYPA*M c.59T>C
2 MNS GYPA*M c.71A>G
2 MNS GYPA*M c.72G>T
3 MNS GYPA*Mc c.71G>A
3 MNS GYPA*Mc c.72T>G
4 MNS GYPA*Vw .140C>T
5 MNS GYPA*Mg c.68C>A
6 MNS GYPA*Vr c.197C>A
1 MNS GYPB*Mta c.230C>T
2 MNS GYPB*Ria c.226G>A
3 MNS GYPB*Nya c.138T>A
4 MNS GYPB*Hut c.140C>A
.
.
.
猫format.awk
BEGIN {FS=OFS="\t"}
~ /GYPA/
{ num=split(,arr,/;/);
for (i=1;i<=num;i++)
{ print NR,,,arr[i]}}
~ /GYPB/
{ num=split(,arr,/;/);
for (i=1;i<=num;i++)
{ print NR,,,arr[i]} }
...
我不确定在到达下一个 ~ GYP 时如何重置 NR。 GYP{A..Z}从A到Z依次排列
awk '
{
match(,/[^*]*/)
gy_value=substr(,RSTART,RLENGTH)
}
gy_value!=prev_gy_value{
count=0
}
!arr[]++{
count++
}
{
num=split(,array,";")
for(i=1;i<=num;i++){
print count,,,array[i]
}
}
NF<3;
{
prev_gy_value=gy_value
}
' file.txt
说明:为以上代码添加详细说明。
awk ' ##Starting awk program from here.
{
match(,/[^*]*/) ##Using match function to match till * in 2nd field.
gy_value=substr(,RSTART,RLENGTH) ##Creating variable gy_value which has sub-string of 2nd field sub-string in it.
}
gy_value!=prev_gy_value{
count=0 ##Creating variable count as 0 here.
}
{
count++ ##Increasing value of count with 1 here.
}
{
num=split(,array,";") ##Splitting 3rd field into an array with delimiter ; and its count is stored into num variable.
for(i=1;i<=num;i++){ ##Starting for loop from i=1 to till value of num here.
print count,,,array[i] ##Printing value of , and array with index variable i here.
}
}
NF<3; ##Checking condition if NF<3 then print the line here.
{
prev_gy_value=gy_value ##Setting value of variable gy_value to variable named prev_gy_value here(which is used above code to make sure about values check).
}
' Input_file ##Mentioning Input_file name here.
I am not sure how to reset NR
when it reaches the the next ~ GYP
. The GYP{A..Z}
are in order from A to Z.
无法重置或更改 NR
、FNR
或 NF
等内部 awk 变量。这些值由 awk 设置。最简单的方法是跟踪替代 NR
作为可以命名 c
或任何其他名称的变量。该值可以在任何情况下重置为您想要的任何值。
示例:有一个每次在记录中看到 foo
时重置为 1 的计数器:
awk '{c++}([=10=] ~ /foo/){c=1}{print c,[=10=]}'
在 OP 的情况下,可能会使用这样的东西:
awk 'BEGIN{FS=OFS="\t"}
{c++; key=substr(,1,index(,"*")-1)}
(key != key_prev) { c=1 }
{ prefix="" }
(key == "GYPA") { prefix="NM_002099.7:"}
{ num=split(,a,";"); for(i=1;i<=num;++i) print c,,,prefix a[i] }
{ key_prev=key }' file
cat file.txt
MNS GYPA*N
MNS GYPA*M c.59T>C;c.71A>G;c.72G>T
MNS GYPA*Mc c.71G>A;c.72T>G
MNS GYPA*Vw c.140C>T
MNS GYPA*Mg c.68C>A
MNS GYPA*Vr c.197C>A
MNS GYPB*Mta c.230C>T
MNS GYPB*Ria c.226G>A
MNS GYPB*Nya c.138T>A
MNS GYPA*Hut c.140C>A
.
.
.
第二列值可以以 GYPA、GYPB、GYPC、GYPD、... GYPZ 开头。我想为每个 GYP* 设置一个位置计数并将第三列拆分如下:
1 MNS GYPA*N
2 MNS GYPA*M c.59T>C
2 MNS GYPA*M c.71A>G
2 MNS GYPA*M c.72G>T
3 MNS GYPA*Mc c.71G>A
3 MNS GYPA*Mc c.72T>G
4 MNS GYPA*Vw .140C>T
5 MNS GYPA*Mg c.68C>A
6 MNS GYPA*Vr c.197C>A
1 MNS GYPB*Mta c.230C>T
2 MNS GYPB*Ria c.226G>A
3 MNS GYPB*Nya c.138T>A
4 MNS GYPB*Hut c.140C>A
.
.
.
猫format.awk
BEGIN {FS=OFS="\t"}
~ /GYPA/
{ num=split(,arr,/;/);
for (i=1;i<=num;i++)
{ print NR,,,arr[i]}}
~ /GYPB/
{ num=split(,arr,/;/);
for (i=1;i<=num;i++)
{ print NR,,,arr[i]} }
...
我不确定在到达下一个 ~ GYP 时如何重置 NR。 GYP{A..Z}从A到Z依次排列
awk '
{
match(,/[^*]*/)
gy_value=substr(,RSTART,RLENGTH)
}
gy_value!=prev_gy_value{
count=0
}
!arr[]++{
count++
}
{
num=split(,array,";")
for(i=1;i<=num;i++){
print count,,,array[i]
}
}
NF<3;
{
prev_gy_value=gy_value
}
' file.txt
说明:为以上代码添加详细说明。
awk ' ##Starting awk program from here.
{
match(,/[^*]*/) ##Using match function to match till * in 2nd field.
gy_value=substr(,RSTART,RLENGTH) ##Creating variable gy_value which has sub-string of 2nd field sub-string in it.
}
gy_value!=prev_gy_value{
count=0 ##Creating variable count as 0 here.
}
{
count++ ##Increasing value of count with 1 here.
}
{
num=split(,array,";") ##Splitting 3rd field into an array with delimiter ; and its count is stored into num variable.
for(i=1;i<=num;i++){ ##Starting for loop from i=1 to till value of num here.
print count,,,array[i] ##Printing value of , and array with index variable i here.
}
}
NF<3; ##Checking condition if NF<3 then print the line here.
{
prev_gy_value=gy_value ##Setting value of variable gy_value to variable named prev_gy_value here(which is used above code to make sure about values check).
}
' Input_file ##Mentioning Input_file name here.
I am not sure how to reset
NR
when it reaches the the next ~GYP
. TheGYP{A..Z}
are in order from A to Z.
无法重置或更改 NR
、FNR
或 NF
等内部 awk 变量。这些值由 awk 设置。最简单的方法是跟踪替代 NR
作为可以命名 c
或任何其他名称的变量。该值可以在任何情况下重置为您想要的任何值。
示例:有一个每次在记录中看到 foo
时重置为 1 的计数器:
awk '{c++}([=10=] ~ /foo/){c=1}{print c,[=10=]}'
在 OP 的情况下,可能会使用这样的东西:
awk 'BEGIN{FS=OFS="\t"}
{c++; key=substr(,1,index(,"*")-1)}
(key != key_prev) { c=1 }
{ prefix="" }
(key == "GYPA") { prefix="NM_002099.7:"}
{ num=split(,a,";"); for(i=1;i<=num;++i) print c,,,prefix a[i] }
{ key_prev=key }' file