使用 SED 替换括号内的特定模式？

Question

我对此有点问题... 我正在尝试使用 Bash 脚本（尤其是 Sed）来处理以下文本。当然，欢迎使用其他方法！但我希望这可能是一个 Bash 解决方案...

棘手的输入：

("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"

期望的输出：

("a"|"b"|"c")."ABC".("e"|"f")."EF"

主要是，我想我想做的是什么都不替换字符串 "|"，但将更改范围限制在括号中任何现有文本之外。

对于我拥有的数据集的不同形式的文本输入，问题变得更加疯狂。与此一样，块（由 . 分隔）与括号和非括号的组合是多种多样的。

提前致谢。

我用 SED 尝试过的东西：

gsed -E "s/(\.\"[[:graph:]]+)\"\|\"//g" input.txt

我得到的输出是：

("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."EF"

看起来我只得到了部分所需的输出...只针对有限的范围...

Answer 1

请您尝试以下操作：

#!/bin/bash

awk 'BEGIN {FS = OFS = "."}                     # use "." as a field separator
{
    for (i = 1; i <= NF; i++) {                 # loop over the fields
        if ($i !~ "^\(.+\)$") {               # if the fields is not enclosed with "(" and ")"
            gsub("\"\|\"", "", $i)             # then remove "|"s
        }
    }
    print
}' <<< '("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"'

输出：

("a"|"b"|"c")."ABC".("e"|"f")."EF"

[说明]

BEGIN {}块在处理输入之前只执行一次文件。初始化变量很有用。
由于awk变量FS被赋值给了“.”，输入行（S）是自动拆分为“.”。然后</code>赋值给第1个字段<code>("a"|"b"|"c")， </code> 分配给第二个 <code>"A"|"B"|"C" .. 等等。 awk 变量 NF 设置为字段数（本例中为 4）。
for 循环 for (i = 1; i <= NF; i++) 遍历字段以依次检查 </code>、<code>、...。
正则表达式 "^$.+$$" 如果变量 $i 匹配，第 i 个字段值，以 ( 开头并以 ) 结尾。运算符 !~ 否定匹配结果然后 if 条件满足未用括号括起来的字段，例如 "A"|"B"|"C".
函数gsub("\"\|\"", "", $i)删除子串"|" 尽可能多的 $i。字符 " 必须用 \ 和 | 必须用 \ 转义。它可能会使代码变得模糊可读性较差。
最后的 print 是 print [=37=] 的 shorthand 打印修改后的由以下字段组成的行：$1、$2、... $4，用 OFS 分隔，这也被分配给“。”。

Answer 2

Assumptions/understandings:

字段以句点分隔
用括号包裹的字段将单独保留
所有其他字段都有 leading/trailing 双引号，而所有其他双引号以及竖线都将被删除

示例数据：

$ cat pipes.dat
("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"
"j"|"K"|"L"."m"|"n"|"o"|"p".("x"|"y"|"z")

一个awk想法：

awk '
BEGIN { FS=OFS="." }                                      # define input/output field separator as a period

      { printf "############\nbefore: %s\n",[=11=]            # print a record separator and the current input line;
                                                          # solely for display purposes; this line can
                                                          # be removed/commented-out once logic is verified

        for (i=1; i<=NF; i++)                             # loop through fields
            if ( $i !~ "^[(].*[)]$" )                     # if field does not start/end with parens then ...
                $i="\"" gensub(/"|\|/,"","g",$i) "\""     # replace field with a new double quote (+) modified string
                                                          # whereby all double quotes and pipes are removed (+)
                                                          # a new ending double quote

        printf "after : %s\n",[=11=]                          # print the newly modified line;
                                                          # can be replaced with "print" once logic is verified
      }
' pipes.dat                                               # read data from file; to read from a variable remove this line and ...
#' <<< "${variable_name}"                                 # uncomment this line

以上生成：

############
before: ("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"
after : ("a"|"b"|"c")."ABC".("e"|"f")."EF"
############
before: "j"|"K"|"L"."m"|"n"|"o"|"p".("x"|"y"|"z")
after : "jKL"."mnop".("x"|"y"|"z")

删除注释并进行 printf 更改后：

awk '
BEGIN { FS=OFS="." }
      { for (i=1; i<=NF; i++)
            if ( $i !~ "^[(].*[)]$" )
                $i="\"" gensub(/"|\|/,"","g",$i) "\"" 
        print
      }
' pipes.dat

生成：

("a"|"b"|"c")."ABC".("e"|"f")."EF"
"jKL"."mnop".("x"|"y"|"z")

使用 SED 替换括号内的特定模式？

Using SED to replace specific patterns found within parentheses?

awk

grep

text-processing

sed