如何从其他文件的列中替换文件中的某些特定列值
How to replace some particular column values in a file from the columns of other file
我知道类似的问题已经在 SO 上回答了很多次。
(一个例子是 here:
但是,这对我来说是独一无二的,因为我需要处理特定的模式。
我想要更新的文件 1 的 header 是
3 6 0 6.0361821 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.994429353 0.000000000 0.000000000
0.000000000 0.994429353 0.000000000
0.000000000 0.000000000 2.469627493
1 'A ' 63548.626894188397
2 'B ' 169717.29799472401
3 'C ' 25598.367262405900
1 2 0.7458220147 0.7458220147 1.8031927376 << need to be updated from here
2 2 0.2486073382 0.2486073382 0.6664347554
3 1 0.2486073382 0.2486073382 2.2628589536
4 1 0.7458220147 0.7458220147 0.2067685394
5 3 0.7458220147 0.7458220147 1.0275486366
6 3 0.2486073382 0.2486073382 1.4420788564 << upto here
T
21.3496599 0.0000000 0.0000000
0.0000000 21.3496599 0.0000000
0.0000000 0.0000000 24.1101752
1
-7.6119990 -0.0000000 0.0000000
0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
2
-7.6119990 0.0000000 0.0000000
-0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
3
3.4711749 0.0000000 0.0000000
我需要将 file1
的 </code>、<code>
和 </code> 从 <code>$ESPi"th
行更新为 "$ESPf"th
[=22] =]、</code> 和 <code>
共 file2
(如下所述)。 file1 中的空格在更新时不应更改。这里 $ESPi"th
和 "$ESPf"th
分别表示第 8 行和第 13 行,并且大小写不同。
文件 2 是
0.750000000 0.750000000 0.730147661 << with these data
0.250000000 0.250000000 0.269852339
0.250000000 0.250000000 0.916275414
0.750000000 0.750000000 0.083724586
0.750000000 0.750000000 0.416074343
0.250000000 0.250000000 0.583925657 < upto these data
我已经尝试完成我的工作。
#!/bin/bash
for j in `seq "$ESPi" 1 "$ESPf"` # ESPi and ESPf are 8 and 13, respectively here and change case by case.
do
ESP1=$(cat file1 | head -n "$j" | tail -n 1 | awk '{print }')
ESP2=$(cat file1 | head -n "$j" | tail -n 1 | awk '{print }')
ESP3=$(cat file1 | head -n "$j" | tail -n 1 | awk '{print }')
for k in `seq 1 1 "$NELEMENTS"` # $NELEMENTS is six here.
do
qeIN1=$(cat file2 | head -n "$k" | tail -n 1 | awk '{print }')
qeIN2=$(cat file2 | head -n "$k" | tail -n 1 | awk '{print }')
qeIN3=$(cat file2 | head -n "$k" | tail -n 1 | awk '{print }')
sed 's/'$ESP1'/'$qeIN1'/g' file1
sed 's/'$ESP2'/'$qeIN2'/g' file1
sed 's/'$ESP3'/'$qeIN3'/g' file1
done
done
这给了我
3 6 0 6.0361821 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.994429353 0.000000000 0.000000000
0.000000000 0.994429353 0.000000000
0.000000000 0.000000000 2.469627493
1 'A ' 63548.626894188397
2 'B ' 169717.29799472401
3 'C ' 25598.367262405900
1 2 0.7458220147 0.7458220147 1.8031927376
2 2 0.750000000 0.750000000 0.6664347554
3 1 0.750000000 0.750000000 2.2628589536
4 1 0.7458220147 0.7458220147 0.2067685394
5 3 0.7458220147 0.7458220147 1.0275486366
6 3 0.750000000 0.750000000 1.4420788564
T
21.3496599 0.0000000 0.0000000
0.0000000 21.3496599 0.0000000
0.0000000 0.0000000 24.1101752
1
-7.6119990 -0.0000000 0.0000000
0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
2
-7.6119990 0.0000000 0.0000000
-0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
3
3.4711749 0.0000000 0.0000000
预期输出为
3 6 0 6.0361821 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.994429353 0.000000000 0.000000000
0.000000000 0.994429353 0.000000000
0.000000000 0.000000000 2.469627493
1 'A ' 63548.626894188397
2 'B ' 169717.29799472401
3 'C ' 25598.367262405900
1 2 0.750000000 0.750000000 0.730147661
2 2 0.250000000 0.250000000 0.269852339
3 1 0.250000000 0.250000000 0.916275414
4 1 0.750000000 0.750000000 0.083724586
5 3 0.750000000 0.750000000 0.416074343
6 3 0.250000000 0.250000000 0.583925657
T
21.3496599 0.0000000 0.0000000
0.0000000 21.3496599 0.0000000
0.0000000 0.0000000 24.1101752
1
-7.6119990 -0.0000000 0.0000000
0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
2
-7.6119990 0.0000000 0.0000000
-0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
3
3.4711749 0.0000000 0.0000000
我正在寻找 shell (bash) 脚本。
gawk 'FNR==NR{ a[i++]=[=10=] }
FNR!=NR && FNR>=8 && FNR<=13{ split(a[j++],b); =b[1]; =b[2]; =b[3]; }
FNR!=NR{ print [=10=] }' file2 file1
FNR = 当前输入文件中的输入记录号。
NR = 到目前为止看到的输入记录总数。
第 8 行和第 13 行在此脚本中已修复,因为没有提供有关如何确定这些值的信息。
编辑:
我忘了保留空格,下一个应该这样做:
gawk 'FNR==NR{ a[i++]=[=11=] }
FNR!=NR && FNR>=8 && FNR<=13{ split(a[j++],b); sub(,b[1]); sub(,b[2]); sub(,b[3]); }
FNR!=NR{ print [=11=] }' file2 file1
#!/bin/bash
ESPi=8
ESPf=13
python > file1.new <<EOF
import sys, re
write = sys.stdout.write
espi = $ESPi
espf = $ESPf
repls = {2:0, 3:1, 4:2}
with open("file1") as f1, open("file2") as f2:
for i in range(espi - 1): write(next(f1))
for i in range(espf - espi + 1):
line = next(f1)
toks = next(f2).split()
for col, rcol in repls.items():
pat = "(\s*)((\S+\s+){{{col}}})(\S+)(.*)".format(col=col)
repl = r"\g<1>\g<2>{val}\g<5>".format(val=toks[rcol])
line = re.sub(pat, repl, line)
write(line)
for line in f1: write(line)
EOF
mv file1.new file1
这将在每个 UNIX 机器上使用任何 shell 中的任何 awk 工作:
$ cat tst.awk
NR==FNR { new[NR]=[=10=]; next }
(espi <= FNR) && (FNR <= espf) {
split(new[FNR-espi+1],vals)
i = 0
while ( match([=10=],/[^[:space:]]+/) ) {
printf "%s%s", substr([=10=],1,RSTART-1), (++i >= 3 ? vals[i-2] : substr([=10=],RSTART,RLENGTH))
[=10=] = substr([=10=],RSTART+RLENGTH)
}
}
{ print }
.
$ awk -v espi=8 -v espf=13 -f tst.awk file2 file1
3 6 0 6.0361821 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.994429353 0.000000000 0.000000000
0.000000000 0.994429353 0.000000000
0.000000000 0.000000000 2.469627493
1 'A ' 63548.626894188397
2 'B ' 169717.29799472401
3 'C ' 25598.367262405900
1 2 0.750000000 0.750000000 0.730147661
2 2 0.250000000 0.250000000 0.269852339
3 1 0.250000000 0.250000000 0.916275414
4 1 0.750000000 0.750000000 0.083724586
5 3 0.750000000 0.750000000 0.416074343
6 3 0.250000000 0.250000000 0.583925657
T
21.3496599 0.0000000 0.0000000
0.0000000 21.3496599 0.0000000
0.0000000 0.0000000 24.1101752
1
-7.6119990 -0.0000000 0.0000000
0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
2
-7.6119990 0.0000000 0.0000000
-0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
3
3.4711749 0.0000000 0.0000000
我知道类似的问题已经在 SO 上回答了很多次。
(一个例子是 但是,这对我来说是独一无二的,因为我需要处理特定的模式。 header 是 我需要将 文件 2 是 我已经尝试完成我的工作。 这给了我 预期输出为 我正在寻找 shell (bash) 脚本。here:
3 6 0 6.0361821 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.994429353 0.000000000 0.000000000
0.000000000 0.994429353 0.000000000
0.000000000 0.000000000 2.469627493
1 'A ' 63548.626894188397
2 'B ' 169717.29799472401
3 'C ' 25598.367262405900
1 2 0.7458220147 0.7458220147 1.8031927376 << need to be updated from here
2 2 0.2486073382 0.2486073382 0.6664347554
3 1 0.2486073382 0.2486073382 2.2628589536
4 1 0.7458220147 0.7458220147 0.2067685394
5 3 0.7458220147 0.7458220147 1.0275486366
6 3 0.2486073382 0.2486073382 1.4420788564 << upto here
T
21.3496599 0.0000000 0.0000000
0.0000000 21.3496599 0.0000000
0.0000000 0.0000000 24.1101752
1
-7.6119990 -0.0000000 0.0000000
0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
2
-7.6119990 0.0000000 0.0000000
-0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
3
3.4711749 0.0000000 0.0000000
file1
的 </code>、<code>
和 </code> 从 <code>$ESPi"th
行更新为 "$ESPf"th
[=22] =]、</code> 和 <code>
共 file2
(如下所述)。 file1 中的空格在更新时不应更改。这里 $ESPi"th
和 "$ESPf"th
分别表示第 8 行和第 13 行,并且大小写不同。 0.750000000 0.750000000 0.730147661 << with these data
0.250000000 0.250000000 0.269852339
0.250000000 0.250000000 0.916275414
0.750000000 0.750000000 0.083724586
0.750000000 0.750000000 0.416074343
0.250000000 0.250000000 0.583925657 < upto these data
#!/bin/bash
for j in `seq "$ESPi" 1 "$ESPf"` # ESPi and ESPf are 8 and 13, respectively here and change case by case.
do
ESP1=$(cat file1 | head -n "$j" | tail -n 1 | awk '{print }')
ESP2=$(cat file1 | head -n "$j" | tail -n 1 | awk '{print }')
ESP3=$(cat file1 | head -n "$j" | tail -n 1 | awk '{print }')
for k in `seq 1 1 "$NELEMENTS"` # $NELEMENTS is six here.
do
qeIN1=$(cat file2 | head -n "$k" | tail -n 1 | awk '{print }')
qeIN2=$(cat file2 | head -n "$k" | tail -n 1 | awk '{print }')
qeIN3=$(cat file2 | head -n "$k" | tail -n 1 | awk '{print }')
sed 's/'$ESP1'/'$qeIN1'/g' file1
sed 's/'$ESP2'/'$qeIN2'/g' file1
sed 's/'$ESP3'/'$qeIN3'/g' file1
done
done
3 6 0 6.0361821 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.994429353 0.000000000 0.000000000
0.000000000 0.994429353 0.000000000
0.000000000 0.000000000 2.469627493
1 'A ' 63548.626894188397
2 'B ' 169717.29799472401
3 'C ' 25598.367262405900
1 2 0.7458220147 0.7458220147 1.8031927376
2 2 0.750000000 0.750000000 0.6664347554
3 1 0.750000000 0.750000000 2.2628589536
4 1 0.7458220147 0.7458220147 0.2067685394
5 3 0.7458220147 0.7458220147 1.0275486366
6 3 0.750000000 0.750000000 1.4420788564
T
21.3496599 0.0000000 0.0000000
0.0000000 21.3496599 0.0000000
0.0000000 0.0000000 24.1101752
1
-7.6119990 -0.0000000 0.0000000
0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
2
-7.6119990 0.0000000 0.0000000
-0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
3
3.4711749 0.0000000 0.0000000
3 6 0 6.0361821 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.994429353 0.000000000 0.000000000
0.000000000 0.994429353 0.000000000
0.000000000 0.000000000 2.469627493
1 'A ' 63548.626894188397
2 'B ' 169717.29799472401
3 'C ' 25598.367262405900
1 2 0.750000000 0.750000000 0.730147661
2 2 0.250000000 0.250000000 0.269852339
3 1 0.250000000 0.250000000 0.916275414
4 1 0.750000000 0.750000000 0.083724586
5 3 0.750000000 0.750000000 0.416074343
6 3 0.250000000 0.250000000 0.583925657
T
21.3496599 0.0000000 0.0000000
0.0000000 21.3496599 0.0000000
0.0000000 0.0000000 24.1101752
1
-7.6119990 -0.0000000 0.0000000
0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
2
-7.6119990 0.0000000 0.0000000
-0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
3
3.4711749 0.0000000 0.0000000
gawk 'FNR==NR{ a[i++]=[=10=] }
FNR!=NR && FNR>=8 && FNR<=13{ split(a[j++],b); =b[1]; =b[2]; =b[3]; }
FNR!=NR{ print [=10=] }' file2 file1
FNR = 当前输入文件中的输入记录号。 NR = 到目前为止看到的输入记录总数。
第 8 行和第 13 行在此脚本中已修复,因为没有提供有关如何确定这些值的信息。
编辑:
我忘了保留空格,下一个应该这样做:
gawk 'FNR==NR{ a[i++]=[=11=] }
FNR!=NR && FNR>=8 && FNR<=13{ split(a[j++],b); sub(,b[1]); sub(,b[2]); sub(,b[3]); }
FNR!=NR{ print [=11=] }' file2 file1
#!/bin/bash
ESPi=8
ESPf=13
python > file1.new <<EOF
import sys, re
write = sys.stdout.write
espi = $ESPi
espf = $ESPf
repls = {2:0, 3:1, 4:2}
with open("file1") as f1, open("file2") as f2:
for i in range(espi - 1): write(next(f1))
for i in range(espf - espi + 1):
line = next(f1)
toks = next(f2).split()
for col, rcol in repls.items():
pat = "(\s*)((\S+\s+){{{col}}})(\S+)(.*)".format(col=col)
repl = r"\g<1>\g<2>{val}\g<5>".format(val=toks[rcol])
line = re.sub(pat, repl, line)
write(line)
for line in f1: write(line)
EOF
mv file1.new file1
这将在每个 UNIX 机器上使用任何 shell 中的任何 awk 工作:
$ cat tst.awk
NR==FNR { new[NR]=[=10=]; next }
(espi <= FNR) && (FNR <= espf) {
split(new[FNR-espi+1],vals)
i = 0
while ( match([=10=],/[^[:space:]]+/) ) {
printf "%s%s", substr([=10=],1,RSTART-1), (++i >= 3 ? vals[i-2] : substr([=10=],RSTART,RLENGTH))
[=10=] = substr([=10=],RSTART+RLENGTH)
}
}
{ print }
.
$ awk -v espi=8 -v espf=13 -f tst.awk file2 file1
3 6 0 6.0361821 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.994429353 0.000000000 0.000000000
0.000000000 0.994429353 0.000000000
0.000000000 0.000000000 2.469627493
1 'A ' 63548.626894188397
2 'B ' 169717.29799472401
3 'C ' 25598.367262405900
1 2 0.750000000 0.750000000 0.730147661
2 2 0.250000000 0.250000000 0.269852339
3 1 0.250000000 0.250000000 0.916275414
4 1 0.750000000 0.750000000 0.083724586
5 3 0.750000000 0.750000000 0.416074343
6 3 0.250000000 0.250000000 0.583925657
T
21.3496599 0.0000000 0.0000000
0.0000000 21.3496599 0.0000000
0.0000000 0.0000000 24.1101752
1
-7.6119990 -0.0000000 0.0000000
0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
2
-7.6119990 0.0000000 0.0000000
-0.0000000 -7.6119990 0.0000000
0.0000000 0.0000000 -7.0331945
3
3.4711749 0.0000000 0.0000000