如何使用 Bash 从一个文件中提取一系列字符串并替换另一个文件中的一系列单独的字符串?

How can I use Bash to pull a series of strings from one file and replace a separate series of strings in another file?

我有两个文件:file1.txt 和 file2.txt,它们都有一组坐标和其他信息。

new_coords=$(sed -n '/Begin/,/End/{//b;p}' file1.txt)
new_coords=$(echo "${new_coords//ATOMIC_POSITIONS (angstrom)}")
old_coords=$(sed -n '/ATOMIC_POSITIONS/,/K_POINTS/{//b;p}' file2.txt)

sed -i 's|$old_coords|$new_coords|g' file2.txt

我认为访问坐标的最快方法是在上方和下方找到最近的文本行以将它们分开。我拥有的“new_coords”变量的唯一问题是有一个“ATOMIC_POSITIONS”标签一直重复直到程序在每一端都达到最终原子坐标,所以我用回声删除了它第 2 行中的语句。

当我回显变量 new_coords 和 old_coords 时,我似乎得到了每个变量的正确输出,但 sed 或 perl 似乎不适用于最后一行代码。

如何在 Bash 中进行这种多字符串控制?我是否遗漏了一小段代码或格式?

file1.txt的例子:

...
     bfgs converged in   9 scf cycles and   8 bfgs steps
     (criteria: energy <  6.0E-05 Ry, force <  1.0E-04 Ry/Bohr)

     End of BFGS Geometry Optimization

     Final energy   =   -1343.8825757257 Ry
Begin final coordinates

ATOMIC_POSITIONS (angstrom)
Fe            1.0730540812        3.7648438571        1.4484500000
Fe            3.2976459188        0.6816561429        1.4484500000
Fe            3.2584040812        2.9049061429        0.0000000000
Fe            1.1122959188        1.5415938571        0.0000000000
C             2.1853500000        2.2232500000        1.4484500000
C             0.0000000000        0.0000000000       -0.0000000000
End final coordinates



     Writing output data file ./out/100Co2C.save/

     init_run     :    486.03s CPU    493.55s WALL (       1 calls)
...

file2.txt 示例:

...
ATOMIC_SPECIES
C      12.0107 C.pbesol-n-kjpaw_psl.1.0.0.UPF
Co     58.933195 co_pbesol_v1.2.uspp.F.UPF
Fe     55.845 Fe.pbesol-spn-kjpaw_psl.0.2.1.UPF
ATOMIC_POSITIONS angstrom
Co            1.0085465598        3.7287218832        1.4484500000
Co            3.3775861828        0.7084455291        1.4484500000
Fe            3.2243420022        2.9272726906        0.0000000000
Co            1.1549449803        1.5394425244        0.0000000000
C             2.1517221305        2.1838545768        1.4484500000
C             0.0096081444        0.0285127960        0.0000000000
K_POINTS crystal
388
    0.0000000000     0.0000000000     0.0000000000 1
...

您可以使用

new_coords=$(sed -n '/Begin/,/End/{//b;p}' file1)
new_coords=$(echo "${new_coords//ATOMIC_POSITIONS (angstrom)}" | sed '/^$/d')
old_coords=$(sed -n '/ATOMIC_POSITIONS/,/K_POINTS/{//b;p}' file2)

quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\^/g; $!a\'$'\n''\n' <<< "" | tr -d '\n'; }
quoteSubst() {
  IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\&/g; s/\n/\&/g' <<<"")
  printf %s "${REPLY%$'\n'}"
}

sed -e ':a' -e '$!{N;ba' -e '}' -e "s/$(quoteRe "$old_coords")/$(quoteSubst "$new_coords")/" file2

查看 online demo:

new_coords=$(sed -n '/Begin/,/End/{//b;p}' <<< "$file1")
new_coords=$(echo "${new_coords//ATOMIC_POSITIONS (angstrom)}" | sed '/^$/d')
old_coords=$(sed -n '/ATOMIC_POSITIONS/,/K_POINTS/{//b;p}' <<< "$file2")

quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\^/g; $!a\'$'\n''\n' <<<"" | tr -d '\n'; }
quoteSubst() {
  IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\&/g; s/\n/\&/g' <<<"")
  printf %s "${REPLY%$'\n'}"
}

sed -e ':a' -e '$!{N;ba' -e '}' -e "s/$(quoteRe "$old_coords")/$(quoteSubst "$new_coords")/" <<< "$file2"

输出:

...
ATOMIC_SPECIES
C      12.0107 C.pbesol-n-kjpaw_psl.1.0.0.UPF
Co     58.933195 co_pbesol_v1.2.uspp.F.UPF
Fe     55.845 Fe.pbesol-spn-kjpaw_psl.0.2.1.UPF
ATOMIC_POSITIONS angstrom
Fe            1.0730540812        3.7648438571        1.4484500000
Fe            3.2976459188        0.6816561429        1.4484500000
Fe            3.2584040812        2.9049061429        0.0000000000
Fe            1.1122959188        1.5415938571        0.0000000000
C             2.1853500000        2.2232500000        1.4484500000
C             0.0000000000        0.0000000000       -0.0000000000
K_POINTS crystal
388
    0.0000000000     0.0000000000     0.0000000000 1
...

查看相关内容