拆分 pcregrep 多行匹配
split pcregrep multiline matches
tl;dr:如何使用 pcregrep 拆分每个多行匹配项?
长版:我的文件中有些行以(小写)字符开头,有些行以数字或特殊字符开头。如果我至少有两行以小写字母开头,我希望在我的输出中出现。但是,我希望每个发现都是 delimited/split 而不是相互附加。
这是正则表达式:
pcregrep -M "([a-z][^\n]*\n){2,}"
所以如果我给出这样的文件:
-- Header --
info1
info2
something
< not interesting >
dont need this
+ new section
additional 1
additional 2
给出的结果是
info1
info2
something
additional 1
additional 2
然而,我想要的是:
info1
info2
something
additional 1
additional 2
这可能吗and/or我必须开始使用Python(或类似的)吗?即使建议从这里开始使用其他东西,但首先知道它是否可行仍然是件好事。
谢谢!
以下 sed
似乎可以解决问题:
sed -n '/^[a-z]/N;/^[a-z].*\n[a-z]/{p;:l n;/^[a-z]/{p;bl};a\
}'
解释:
/^[a-z]/{ # if a line starts with a LC letter
N; # consume the next line while conserving the previous one
/^[a-z].*\n[a-z]/{ # test whether the second line also starts with a LC letter
p; # print the two lines of the buffer
l: n; # define a label "l", and reads a new line
/^[a-z]/{ # if the new line still starts with a LC letter
p; # print it
bl # jump back to label "l"
}
a\
# append a new line after every group of success
}
}
$ echo '-- Header --
> info1
> info2
> something
> < not interesting >
> dont need this
> + new section
> additional 1
> additional 2 ' | sed -n '/^[a-z]/N;/^[a-z].*\n[a-z]/{p;:l n;/^[a-z]/{p;bl};a\
>
> }'
info1
info2
something
additional 1
additional 2
tl;dr:如何使用 pcregrep 拆分每个多行匹配项?
长版:我的文件中有些行以(小写)字符开头,有些行以数字或特殊字符开头。如果我至少有两行以小写字母开头,我希望在我的输出中出现。但是,我希望每个发现都是 delimited/split 而不是相互附加。 这是正则表达式:
pcregrep -M "([a-z][^\n]*\n){2,}"
所以如果我给出这样的文件:
-- Header --
info1
info2
something
< not interesting >
dont need this
+ new section
additional 1
additional 2
给出的结果是
info1
info2
something
additional 1
additional 2
然而,我想要的是:
info1
info2
something
additional 1
additional 2
这可能吗and/or我必须开始使用Python(或类似的)吗?即使建议从这里开始使用其他东西,但首先知道它是否可行仍然是件好事。
谢谢!
以下 sed
似乎可以解决问题:
sed -n '/^[a-z]/N;/^[a-z].*\n[a-z]/{p;:l n;/^[a-z]/{p;bl};a\
}'
解释:
/^[a-z]/{ # if a line starts with a LC letter
N; # consume the next line while conserving the previous one
/^[a-z].*\n[a-z]/{ # test whether the second line also starts with a LC letter
p; # print the two lines of the buffer
l: n; # define a label "l", and reads a new line
/^[a-z]/{ # if the new line still starts with a LC letter
p; # print it
bl # jump back to label "l"
}
a\
# append a new line after every group of success
}
}
$ echo '-- Header --
> info1
> info2
> something
> < not interesting >
> dont need this
> + new section
> additional 1
> additional 2 ' | sed -n '/^[a-z]/N;/^[a-z].*\n[a-z]/{p;:l n;/^[a-z]/{p;bl};a\
>
> }'
info1
info2
something
additional 1
additional 2