如何使用 awk、sed 或 grep 将一个 space 字符(不更改任何其他字符)添加到 "one character strings"?

How to add one space character (without changing any other characters) to "one character strings" using awk, sed, or grep?

我使用 sed 和 awk (leap.log) 获得了这个文本文件:

Template_frcmod
MASS

Pd 0.000         0.000 

BOND
Pd-c
Pd-3e
c-Pd
4p-ca
o-3e
n-3e
Pd-4e
3p-ca
o-4e
n-4e

ANGLE
Pd-c-Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c-Pd-4p
c-Pd-3e
c-Pd-1c
c-Pd-3p
c-Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o-3e-n
3e-n-c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o-4e-n
4e-n-c3
ca-3p-ca

DIHE

 Pd-4p-ca-ca
 Pd-3e-n-c3
 c-Pd-3e-o
 c-Pd-3e-n
 c-Pd-4e-o
 c-Pd-4e-n
 4p-Pd-3e-o
 4p-Pd-3e-n
 o-3e-n-c3
 o-3e-Pd-1c
 n-3e-Pd-1c
 ca-4p-ca-ca
 ca-ca-4p-ca
 Pd-3p-ca-ca
 Pd-4e-n-c3
 1c-Pd-4e-o
 1c-Pd-4e-n
 3p-Pd-4e-o
 3p-Pd-4e-n
 o-4e-n-c3
 ca-3p-ca-ca
 ca-ca-3p-ca

IMPROPER

NONBON

现在我遇到了“一个字符”原子名称的问题:

c-Pd-4p

在这一行和所有其他类似的行(包含一个字符的原子名称)中,“c”必须是两个字符:“c”(带有 space):

c -Pd-4p

或在这一行: 4e-n-c3 "n" 必须是 "n " 4e-n -c3 或这一行: “Pd-c”必须是“Pd-c” exc.. 所有包含一个字符的原子名称必须是两个字符并得到一个 space 字符。

当我尝试将 "c" 更改为 "c " 时,“1c”变为“1c”: Pd-1c-Pd --> Pd-1c -Pd 但我不想更改 2 个字符原子名称。它必须保持不变。

尝试此命令时:

awk 'BEGIN{FS="-"}{ if(length() == 1 ) = " " } {print [=15=]}' leap.log

这次“-”符号消失了。我应该怎么做才能添加所有带 space 的单字符原子名称?

预期结果(评论只是针对这个问题真实文件将没有评论):

Template_frcmod
MASS

Pd 0.000         0.000 

BOND
Pd-c  #Also the last "c" must be "c " 
Pd-3e
c -Pd
4p-ca
o -3e
n -3e
Pd-4e
3p-ca
o -4e
n -4e

ANGLE
Pd-c -Pd
Pd-3e-o 
Pd-3e-n 
Pd-1c-Pd
c -Pd-4p
c -Pd-3e
c -Pd-1c
c -Pd-3p
c -Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o -3e-n 
3e-n -c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o 
Pd-4e-n 
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o -4e-n
4e-n -c3
ca-3p-ca

DIHE

Pd-4p-ca-ca
Pd-3e-n-c3
c -Pd-3e-o #Also the last "o" must be "o "
c -Pd-3e-n #Also the last "n" must be "n " 
c -Pd-4e-o #Also the last "o" must be "o "
c-Pd-4e-n  #Also the last "n" must be "n "  
4p-Pd-3e-o #Also the last "o" must be "o " 
4p-Pd-3e-n #Also the last "n" must be "n " 
o -3e-n-c3
o -3e-Pd-1c
n-3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n-c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o -4e-n -c3
ca-3p-ca-ca
ca-ca-3p-ca

IMPROPER

NONBON

Perl 来拯救!

perl -pe '/-/ and s/\b(\w)\b/ /g' leap.log
  • -p逐行读取输入,处理后打印每一行;
  • /-/ 仅适用于包含破折号的行;
  • s/PATTERN/SUBSTITUTION/类似于sed的;
  • \w 匹配“单词字符”,即字母、数字或下划线;
  • \b 匹配单词开始或结束的位置。

假设:

  • 唯一感兴趣的行也是唯一包含 -
  • 的行
  • 对于感兴趣的行,只有一个字段包含 -
  • 需要测试所有 - 分隔的字符串,所有带有 length()==1 的字符串都必须在字段末尾附加 space ( )
  • 一行中的前导白space可以是ignored/removed

一个awk想法(带白色前导space):

awk '
/-/ { n=split(,arr,"-")                          # split field #1 into arr[] array based on "-" delimiter
      x=delim=""
      for (i=1;i<=n;i++) {                         # loop through array
          # piece together our new field
          x=x delim arr[i] ( length(arr[i]) == 1 ? " " : "")
          delim="-"
      }
      =x                                         # replace field #1 with value in variable "x"
    }
1
' leap.log

另一个awk想法(保持领先的白色space):

awk '
BEGIN { FS=OFS="-" }                   # define input/output field delimiter == "-"
NF>1  { for (i=1;i<=NF;i++) {          # if more than one "-" delimited field then ...
            old=$i
            gsub(/ /,"",old)           # strip any (leading) spaces from field
            if (length(old) == 1)      # if lenght() == 1 then ...
               $i=$i " "               # append space to current field
        }
      }
1
' leap.log

这两个生成:

Template_frcmod
MASS

Pd 0.000         0.000

BOND
Pd-c
Pd-3e
c -Pd
4p-ca
o -3e
n -3e
Pd-4e
3p-ca
o -4e
n -4e

ANGLE
Pd-c -Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c -Pd-4p
c -Pd-3e
c -Pd-1c
c -Pd-3p
c -Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o -3e-n
3e-n -c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o -4e-n
4e-n -c3
ca-3p-ca

DIHE

 Pd-4p-ca-ca
 Pd-3e-n -c3
 c -Pd-3e-o
 c -Pd-3e-n
 c -Pd-4e-o
 c -Pd-4e-n
 4p-Pd-3e-o
 4p-Pd-3e-n
 o -3e-n -c3
 o -3e-Pd-1c
 n -3e-Pd-1c
 ca-4p-ca-ca
 ca-ca-4p-ca
 Pd-3p-ca-ca
 Pd-4e-n -c3
 1c-Pd-4e-o
 1c-Pd-4e-n
 3p-Pd-4e-o
 3p-Pd-4e-n
 o -4e-n -c3
 ca-3p-ca-ca
 ca-ca-3p-ca


IMPROPER

NONBON

注意: 对于第一个 awk 脚本 DIHE 下的条目失去了前导白色 space