如何使用 awk、sed 或 grep 将一个 space 字符(不更改任何其他字符)添加到 "one character strings"?
How to add one space character (without changing any other characters) to "one character strings" using awk, sed, or grep?
我使用 sed 和 awk (leap.log) 获得了这个文本文件:
Template_frcmod
MASS
Pd 0.000 0.000
BOND
Pd-c
Pd-3e
c-Pd
4p-ca
o-3e
n-3e
Pd-4e
3p-ca
o-4e
n-4e
ANGLE
Pd-c-Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c-Pd-4p
c-Pd-3e
c-Pd-1c
c-Pd-3p
c-Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o-3e-n
3e-n-c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o-4e-n
4e-n-c3
ca-3p-ca
DIHE
Pd-4p-ca-ca
Pd-3e-n-c3
c-Pd-3e-o
c-Pd-3e-n
c-Pd-4e-o
c-Pd-4e-n
4p-Pd-3e-o
4p-Pd-3e-n
o-3e-n-c3
o-3e-Pd-1c
n-3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n-c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o-4e-n-c3
ca-3p-ca-ca
ca-ca-3p-ca
IMPROPER
NONBON
现在我遇到了“一个字符”原子名称的问题:
c-Pd-4p
在这一行和所有其他类似的行(包含一个字符的原子名称)中,“c”必须是两个字符:“c”(带有 space):
c -Pd-4p
或在这一行:
4e-n-c3
"n"
必须是 "n "
4e-n -c3
或这一行:
“Pd-c”必须是“Pd-c”
exc.. 所有包含一个字符的原子名称必须是两个字符并得到一个 space 字符。
当我尝试将 "c"
更改为 "c "
时,“1c”变为“1c”:
Pd-1c-Pd
--> Pd-1c -Pd
但我不想更改 2 个字符原子名称。它必须保持不变。
尝试此命令时:
awk 'BEGIN{FS="-"}{ if(length() == 1 ) = " " } {print [=15=]}' leap.log
这次“-”符号消失了。我应该怎么做才能添加所有带 space 的单字符原子名称?
预期结果(评论只是针对这个问题真实文件将没有评论):
Template_frcmod
MASS
Pd 0.000 0.000
BOND
Pd-c #Also the last "c" must be "c "
Pd-3e
c -Pd
4p-ca
o -3e
n -3e
Pd-4e
3p-ca
o -4e
n -4e
ANGLE
Pd-c -Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c -Pd-4p
c -Pd-3e
c -Pd-1c
c -Pd-3p
c -Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o -3e-n
3e-n -c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o -4e-n
4e-n -c3
ca-3p-ca
DIHE
Pd-4p-ca-ca
Pd-3e-n-c3
c -Pd-3e-o #Also the last "o" must be "o "
c -Pd-3e-n #Also the last "n" must be "n "
c -Pd-4e-o #Also the last "o" must be "o "
c-Pd-4e-n #Also the last "n" must be "n "
4p-Pd-3e-o #Also the last "o" must be "o "
4p-Pd-3e-n #Also the last "n" must be "n "
o -3e-n-c3
o -3e-Pd-1c
n-3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n-c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o -4e-n -c3
ca-3p-ca-ca
ca-ca-3p-ca
IMPROPER
NONBON
Perl 来拯救!
perl -pe '/-/ and s/\b(\w)\b/ /g' leap.log
-p
逐行读取输入,处理后打印每一行;
/-/
仅适用于包含破折号的行;
s/PATTERN/SUBSTITUTION/
类似于sed的;
\w
匹配“单词字符”,即字母、数字或下划线;
\b
匹配单词开始或结束的位置。
假设:
- 唯一感兴趣的行也是唯一包含
-
的行
- 对于感兴趣的行,只有一个字段包含
-
- 需要测试所有
-
分隔的字符串,所有带有 length()==1
的字符串都必须在字段末尾附加 space (
)
- 一行中的前导白space可以是ignored/removed
一个awk
想法(带白色前导space):
awk '
/-/ { n=split(,arr,"-") # split field #1 into arr[] array based on "-" delimiter
x=delim=""
for (i=1;i<=n;i++) { # loop through array
# piece together our new field
x=x delim arr[i] ( length(arr[i]) == 1 ? " " : "")
delim="-"
}
=x # replace field #1 with value in variable "x"
}
1
' leap.log
另一个awk
想法(保持领先的白色space):
awk '
BEGIN { FS=OFS="-" } # define input/output field delimiter == "-"
NF>1 { for (i=1;i<=NF;i++) { # if more than one "-" delimited field then ...
old=$i
gsub(/ /,"",old) # strip any (leading) spaces from field
if (length(old) == 1) # if lenght() == 1 then ...
$i=$i " " # append space to current field
}
}
1
' leap.log
这两个生成:
Template_frcmod
MASS
Pd 0.000 0.000
BOND
Pd-c
Pd-3e
c -Pd
4p-ca
o -3e
n -3e
Pd-4e
3p-ca
o -4e
n -4e
ANGLE
Pd-c -Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c -Pd-4p
c -Pd-3e
c -Pd-1c
c -Pd-3p
c -Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o -3e-n
3e-n -c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o -4e-n
4e-n -c3
ca-3p-ca
DIHE
Pd-4p-ca-ca
Pd-3e-n -c3
c -Pd-3e-o
c -Pd-3e-n
c -Pd-4e-o
c -Pd-4e-n
4p-Pd-3e-o
4p-Pd-3e-n
o -3e-n -c3
o -3e-Pd-1c
n -3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n -c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o -4e-n -c3
ca-3p-ca-ca
ca-ca-3p-ca
IMPROPER
NONBON
注意: 对于第一个 awk
脚本 DIHE
下的条目失去了前导白色 space
我使用 sed 和 awk (leap.log) 获得了这个文本文件:
Template_frcmod
MASS
Pd 0.000 0.000
BOND
Pd-c
Pd-3e
c-Pd
4p-ca
o-3e
n-3e
Pd-4e
3p-ca
o-4e
n-4e
ANGLE
Pd-c-Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c-Pd-4p
c-Pd-3e
c-Pd-1c
c-Pd-3p
c-Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o-3e-n
3e-n-c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o-4e-n
4e-n-c3
ca-3p-ca
DIHE
Pd-4p-ca-ca
Pd-3e-n-c3
c-Pd-3e-o
c-Pd-3e-n
c-Pd-4e-o
c-Pd-4e-n
4p-Pd-3e-o
4p-Pd-3e-n
o-3e-n-c3
o-3e-Pd-1c
n-3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n-c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o-4e-n-c3
ca-3p-ca-ca
ca-ca-3p-ca
IMPROPER
NONBON
现在我遇到了“一个字符”原子名称的问题:
c-Pd-4p
在这一行和所有其他类似的行(包含一个字符的原子名称)中,“c”必须是两个字符:“c”(带有 space):
c -Pd-4p
或在这一行:
4e-n-c3
"n"
必须是 "n "
4e-n -c3
或这一行:
“Pd-c”必须是“Pd-c”
exc.. 所有包含一个字符的原子名称必须是两个字符并得到一个 space 字符。
当我尝试将 "c"
更改为 "c "
时,“1c”变为“1c”:
Pd-1c-Pd
--> Pd-1c -Pd
但我不想更改 2 个字符原子名称。它必须保持不变。
尝试此命令时:
awk 'BEGIN{FS="-"}{ if(length() == 1 ) = " " } {print [=15=]}' leap.log
这次“-”符号消失了。我应该怎么做才能添加所有带 space 的单字符原子名称?
预期结果(评论只是针对这个问题真实文件将没有评论):
Template_frcmod
MASS
Pd 0.000 0.000
BOND
Pd-c #Also the last "c" must be "c "
Pd-3e
c -Pd
4p-ca
o -3e
n -3e
Pd-4e
3p-ca
o -4e
n -4e
ANGLE
Pd-c -Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c -Pd-4p
c -Pd-3e
c -Pd-1c
c -Pd-3p
c -Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o -3e-n
3e-n -c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o -4e-n
4e-n -c3
ca-3p-ca
DIHE
Pd-4p-ca-ca
Pd-3e-n-c3
c -Pd-3e-o #Also the last "o" must be "o "
c -Pd-3e-n #Also the last "n" must be "n "
c -Pd-4e-o #Also the last "o" must be "o "
c-Pd-4e-n #Also the last "n" must be "n "
4p-Pd-3e-o #Also the last "o" must be "o "
4p-Pd-3e-n #Also the last "n" must be "n "
o -3e-n-c3
o -3e-Pd-1c
n-3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n-c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o -4e-n -c3
ca-3p-ca-ca
ca-ca-3p-ca
IMPROPER
NONBON
Perl 来拯救!
perl -pe '/-/ and s/\b(\w)\b/ /g' leap.log
-p
逐行读取输入,处理后打印每一行;/-/
仅适用于包含破折号的行;s/PATTERN/SUBSTITUTION/
类似于sed的;\w
匹配“单词字符”,即字母、数字或下划线;\b
匹配单词开始或结束的位置。
假设:
- 唯一感兴趣的行也是唯一包含
-
的行
- 对于感兴趣的行,只有一个字段包含
-
- 需要测试所有
-
分隔的字符串,所有带有length()==1
的字符串都必须在字段末尾附加 space ( - 一行中的前导白space可以是ignored/removed
一个awk
想法(带白色前导space):
awk '
/-/ { n=split(,arr,"-") # split field #1 into arr[] array based on "-" delimiter
x=delim=""
for (i=1;i<=n;i++) { # loop through array
# piece together our new field
x=x delim arr[i] ( length(arr[i]) == 1 ? " " : "")
delim="-"
}
=x # replace field #1 with value in variable "x"
}
1
' leap.log
另一个awk
想法(保持领先的白色space):
awk '
BEGIN { FS=OFS="-" } # define input/output field delimiter == "-"
NF>1 { for (i=1;i<=NF;i++) { # if more than one "-" delimited field then ...
old=$i
gsub(/ /,"",old) # strip any (leading) spaces from field
if (length(old) == 1) # if lenght() == 1 then ...
$i=$i " " # append space to current field
}
}
1
' leap.log
这两个生成:
Template_frcmod
MASS
Pd 0.000 0.000
BOND
Pd-c
Pd-3e
c -Pd
4p-ca
o -3e
n -3e
Pd-4e
3p-ca
o -4e
n -4e
ANGLE
Pd-c -Pd
Pd-3e-o
Pd-3e-n
Pd-1c-Pd
c -Pd-4p
c -Pd-3e
c -Pd-1c
c -Pd-3p
c -Pd-4e
4p-ca-ca
4p-Pd-3e
4p-Pd-1c
o -3e-n
3e-n -c3
3e-Pd-1c
ca-4p-ca
Pd-4e-o
Pd-4e-n
1c-Pd-4e
3p-ca-ca
3p-Pd-4e
o -4e-n
4e-n -c3
ca-3p-ca
DIHE
Pd-4p-ca-ca
Pd-3e-n -c3
c -Pd-3e-o
c -Pd-3e-n
c -Pd-4e-o
c -Pd-4e-n
4p-Pd-3e-o
4p-Pd-3e-n
o -3e-n -c3
o -3e-Pd-1c
n -3e-Pd-1c
ca-4p-ca-ca
ca-ca-4p-ca
Pd-3p-ca-ca
Pd-4e-n -c3
1c-Pd-4e-o
1c-Pd-4e-n
3p-Pd-4e-o
3p-Pd-4e-n
o -4e-n -c3
ca-3p-ca-ca
ca-ca-3p-ca
IMPROPER
NONBON
注意: 对于第一个 awk
脚本 DIHE
下的条目失去了前导白色 space