按缩进模式处理文本文件

Question

我尝试了一些 sed 与 s/regex/../ 的组合，但没有成功。所以这是我的问题：我有一个看起来像这样的文本文件（PCLint 输出）

--- Module A
    Info: indented message 1
    Note: indented message 2
    Warning: indented message 3
--- Module B
--- Module C
    Info: indented message 1
--- Module D

我想将结果更改为如下所示（teamcity 服务消息）：

[Start Module="Module A"]
    [Message Content="Info: indented message 1"]
    [Message Content="Note: indented message 2"]
    [Message Content="Warning: indented message 3"]
[End Module="Module A"]
[Start Module="Module B"]
[End Module="Module B"]
[Start Module="Module C"]
    [Message Content="Info: indented message 1"]
[End Module="Module C"]
[Start Module="Module D"]
[End Module="Module D"]

所以我知道文本将以某种方式在每个“---”之间分成块。然后我应该 wrap/substitute 具有正则表达式功能的文本块。但我不知道如何有效地做到这一点。理想情况下，我喜欢使用 busybox 中可用的工具，例如sed、awk等工具保持"simple"（需要在Win64上工作）

正则表达式我可以很好地使用，但我无法确定它的范围。对我有什么提示吗？

Answer 1

Awk 可以做到这一点。您需要一个匹配 /^---/ 的子句，它设置一个变量来记录您所在的模块，并输出前一个模块的结束行（如果有）和下一个模块的开始行。然后是输出消息行的第二个子句。

$ cat input | awk '/^---/ { IFS=" "; oldM=M; M=; if (oldM) { print "[End Module=\"Module " oldM "\"]"; }; print "[Begin Module=\"Module " M "\"]"; } /^    (.*)$/ { gsub(/^ +/, "", [=10=]); print "    [Message Content=\"" [=10=] "\"]"; } END { print "[End Module=\"Module " M "\"]"; }'
[Begin Module="Module A"]
    [Message Content="Info: indented message 1"]
    [Message Content="Note: indented message 2"]
    [Message Content="Warning: indented message 3"]
[End Module="Module A"]
[Begin Module="Module B"]
[End Module="Module B"]
[Begin Module="Module C"]
    [Message Content="Info: indented message 1"]
[End Module="Module C"]
[Begin Module="Module D"]
[End Module="Module D"]

Answer 2

这里有一个用于该目的的 sed 脚本：

translate.sed:

:a
/Module/ {
    x
    s/.*Module (.*)/[End Module=""]/p
    x
    h
    s/(--- )(.*)/[Start Module=""]/p
    :b
    n
    /Module/! {
        s/(\s*)(.*)/[Message Content=""]/p
        bb
    }
    /Module/ {
        $!ba
        h
        s/(--- )(.*)/[Start Module=""]/p
        x
        s/.*Module (.*)/[End Module=""]/p
    }
}

这样执行：

sed -nrf translate.sed file.txt

输出：

[Start Module="Module A"]
    [Message Content="Info: indented message 1"]
    [Message Content="Note: indented message 2"]
    [Message Content="Warning: indented message 3"]
[End Module="A"]
[Start Module="Module B"]
[End Module="B"]
[Start Module="Module C"]
    [Message Content="Info: indented message 1"]
[End Module="C"]
[Start Module="Module D"]

这是添加了解释的相同版本的脚本：

translate.sed

# Define lable 'a' to iterate over modules
:a

# If the line module is matched ...
/Module/ {
    # Swap contents of hold and pattern buffer (current line)
    x

    # If the pattern buffer (former hold buffer)
    # contains something it is a module starting line.
    # Create and end tag out of it.
    s/.*Module (.*)/[End Module=""]/p

    # Get the current line back from hold buffer
    x

    h

    # Create a start module tag
    s/(--- )(.*)/[Start Module=""]/p

    # Create a label to iterate over messages
    :b

    # Get next line from input into pattern buffer
    # (Overwrite the pattern buffer)
    n

    # If it is not a module starting line ...
    /Module/! {

        # ... wrap it into the Message Content tag
        s/(\s*)(.*)/[Message Content=""]/p

        # and go on with the next line (step back to b)     
        bb
    }

    /Module/ {
        # if it is not the last line 
        # go on with the next module (step back to a)
        $!ba

        # on the last line ...

        # backup the current line in the hold buffer
        h

        # create start tag
        s/(--- )(.*)/[Start Module=""]/p

        # swap hold and pattern buffer
        x

        # create the end tag
        s/.*Module (.*)/[End Module=""]/p
    }
}

顺便说一句，它当然也可以是单行的:D

sed -rn ':a;/Module/{;x;s/.*Module(.*)/[EndModule=""]/p;x;h;s/(---)(.*)/[StartModule=""]/p;:b;n;/Module/!{;s/(\s*)(.*)/[MessageContent=""]/p;;bb;};/Module/{;$!ba;h;s/(---)(.*)/[StartModule=""]/p;x;s/.*Module(.*)/[EndModule=""]/p;};};' file.txt

Answer 3

sed '# prepare loading
   s/^--- Module \(.*\)/[Start Module=""]\
[End Module=""]/
   s/^\([[:space:]]\{4\}\)\(.*\)/[Message Content=""]/
   H;$!d

# permutation
   x;s/\n/²/g;s/$/²/
:cycle
   s/²\(\[End[^²]*\)²\([[:space:]][^²]*\)²/²²/g
   t cycle
   s/.//;s/.$//;s/²/\
/g
' YourFile

使用递归修改

更改消息行
加载内存中的行和循环读取（d无输出）
最后加载内存内容
用其他字符替换新行（[^\n] 在 posix 版本
必要时用消息内容置换模块结尾
记录新行（并删除多余的行）
输出结果

通常，在 GNU sed 上不需要换行，所以直接在代码中将 ² 更改为 \n 并删除替换部分

按缩进模式处理文本文件

Process text file by indented pattern

regex

teamcity

awk

sed

pc-lint