使用 bash 脚本(awk、sed 等)在两个不同的列中正确地将数字更改为特定名称
Change numbers to specific name in properly step in two different columns by using bash script (awk, sed, etc)
我的输入(我文档的一小部分,我还必须在 100 个文档上使用这个程序):
86834 SOL4504
86955 SOL5240
86963 SOL4251
SOL15 38222
SOL17 35642
SOL110 41053
我的输出:
MGD674 SOL4504
MGD675 SOL5240
MGD675 SOL4251
SOL15 MGD297
SOL17 MGD277
SOL110 MGD319
在我的程序中,我想将号码更改为特定名称。对于从1到129的号码,我将号码更改为名称MGD1(例如号码:1,名称:MGD1;再例如号码:92,名称:MGD1;再例如号码12905,名称:MGD101等)。我也必须在 100 个文件中执行此操作。
首先,我想用这种方式来做,但是你可以创建完全不同代码:
#!/bin/bash
MGD_atom_index=1
number=1
MGD_mol_index=MGD$number
for index in {1..100} // I do this script on 100 files, that's why I use for loop
do
for MGD_index in {1..900} //I run this 900 times for each file, because for every name (for example for every MGD1 program try to find and replace number, I will have max MGD900, because the highest number is 116100, so 116100/129 = 900.
do
sed -i "s/$MGD_atom_index/$MGD_mol_index/g;s/$(($MGD_atom_index+1))/$MGD_mol_index/g;s/$(($MGD_atom_index+2))/$MGD_mol_index/g.(this code will be very long, because I need write " s/$(($MGD_atom_index+2))/$MGD_mol_index/g" until I have $MGD_atom_index+128.....s/$(($MGD_atom_index+128))/$MGD_mol_index/g" new2_$index.ndx
MGD_atom_index=$(($MGD_atom_index+129)) // I change atom index so for example first I look for numbers from 1 to 129 and change it to MGD1 and now I will find numbers from 130 to 258 and looking for MGD2
number=$(($number+1))
MGD_mol_index=SOL$number I change and now I try to find and replace MGD2
done
MGD_atom_index=1 //here I reset all variables to one, because I will work on another file
number=1
MGD_mol_index=MGD$number
done
但是我有个问题,这段代码会特别长,因为我需要写129次
s/$(($MGD_atom_index+x))/$MGD_mol_index/g; ,其中 x 是 1 到 128 之间的数字)
而且我还认为我的程序可能很慢。也许有更好的方法来做到这一点?
我认为这个 awk 就是您所需要的。
awk '
~/^[0-9]+$/{="MDG" int(/129+1)}
~/^[0-9]+$/{="MDG" int(/129+1)}
1
' file
$ cat tst.awk
BEGIN { grp = 129 }
{
for (i=1; i<=NF; i++) {
if ( $i == ($i+0) ) {
$i = "MGD" (int($i/grp)+1)
}
}
print
}
$ awk -f tst.awk file
MGD674 SOL4504
MGD675 SOL5240
MGD675 SOL4251
SOL15 MGD297
SOL17 MGD277
SOL110 MGD319
所以您想要在 shell 脚本中使用 GNU awk 进行 "inplace" 编辑:
#!/bin/env bash
awk -i inplace '
BEGIN { grp = 129 }
{
for (i=1; i<=NF; i++) {
if ( $i == ($i+0) ) {
$i = "MGD" (int($i/grp)+1)
}
}
print
}
' 'new2_'{1..100}'.ndx'
或任何 awk:
#!/bin/env bash
tmp=$(mktemp) || exit 1
for index in {1..100}; do
awk '
BEGIN { grp = 129 }
{
for (i=1; i<=NF; i++) {
if ( $i == ($i+0) ) {
$i = "MGD" (int($i/grp)+1)
}
}
print
}
' "new2_$index.ndx" > "$tmp" && mv "$tmp" "new2_$index.ndx"
done
这可能对你有用(GNU sed 和 bash):
sed -E 's#\b([0-9]+)\b#MGD$((/129+1))#g;s/.*/echo "&"/e' file
将所有数字组转换为所需的格式,方法是用 shell 数字表达式代替,前面加上 MGD
,然后使用 echo 命令计算表达式。
我的输入(我文档的一小部分,我还必须在 100 个文档上使用这个程序):
86834 SOL4504
86955 SOL5240
86963 SOL4251
SOL15 38222
SOL17 35642
SOL110 41053
我的输出:
MGD674 SOL4504
MGD675 SOL5240
MGD675 SOL4251
SOL15 MGD297
SOL17 MGD277
SOL110 MGD319
在我的程序中,我想将号码更改为特定名称。对于从1到129的号码,我将号码更改为名称MGD1(例如号码:1,名称:MGD1;再例如号码:92,名称:MGD1;再例如号码12905,名称:MGD101等)。我也必须在 100 个文件中执行此操作。
首先,我想用这种方式来做,但是你可以创建完全不同代码:
#!/bin/bash
MGD_atom_index=1
number=1
MGD_mol_index=MGD$number
for index in {1..100} // I do this script on 100 files, that's why I use for loop
do
for MGD_index in {1..900} //I run this 900 times for each file, because for every name (for example for every MGD1 program try to find and replace number, I will have max MGD900, because the highest number is 116100, so 116100/129 = 900.
do
sed -i "s/$MGD_atom_index/$MGD_mol_index/g;s/$(($MGD_atom_index+1))/$MGD_mol_index/g;s/$(($MGD_atom_index+2))/$MGD_mol_index/g.(this code will be very long, because I need write " s/$(($MGD_atom_index+2))/$MGD_mol_index/g" until I have $MGD_atom_index+128.....s/$(($MGD_atom_index+128))/$MGD_mol_index/g" new2_$index.ndx
MGD_atom_index=$(($MGD_atom_index+129)) // I change atom index so for example first I look for numbers from 1 to 129 and change it to MGD1 and now I will find numbers from 130 to 258 and looking for MGD2
number=$(($number+1))
MGD_mol_index=SOL$number I change and now I try to find and replace MGD2
done
MGD_atom_index=1 //here I reset all variables to one, because I will work on another file
number=1
MGD_mol_index=MGD$number
done
但是我有个问题,这段代码会特别长,因为我需要写129次 s/$(($MGD_atom_index+x))/$MGD_mol_index/g; ,其中 x 是 1 到 128 之间的数字) 而且我还认为我的程序可能很慢。也许有更好的方法来做到这一点?
我认为这个 awk 就是您所需要的。
awk '
~/^[0-9]+$/{="MDG" int(/129+1)}
~/^[0-9]+$/{="MDG" int(/129+1)}
1
' file
$ cat tst.awk
BEGIN { grp = 129 }
{
for (i=1; i<=NF; i++) {
if ( $i == ($i+0) ) {
$i = "MGD" (int($i/grp)+1)
}
}
print
}
$ awk -f tst.awk file
MGD674 SOL4504
MGD675 SOL5240
MGD675 SOL4251
SOL15 MGD297
SOL17 MGD277
SOL110 MGD319
所以您想要在 shell 脚本中使用 GNU awk 进行 "inplace" 编辑:
#!/bin/env bash
awk -i inplace '
BEGIN { grp = 129 }
{
for (i=1; i<=NF; i++) {
if ( $i == ($i+0) ) {
$i = "MGD" (int($i/grp)+1)
}
}
print
}
' 'new2_'{1..100}'.ndx'
或任何 awk:
#!/bin/env bash
tmp=$(mktemp) || exit 1
for index in {1..100}; do
awk '
BEGIN { grp = 129 }
{
for (i=1; i<=NF; i++) {
if ( $i == ($i+0) ) {
$i = "MGD" (int($i/grp)+1)
}
}
print
}
' "new2_$index.ndx" > "$tmp" && mv "$tmp" "new2_$index.ndx"
done
这可能对你有用(GNU sed 和 bash):
sed -E 's#\b([0-9]+)\b#MGD$((/129+1))#g;s/.*/echo "&"/e' file
将所有数字组转换为所需的格式,方法是用 shell 数字表达式代替,前面加上 MGD
,然后使用 echo 命令计算表达式。