在 linux 中包含特定单词的每一行之前引入一个新行

Question

我是 linux 的新手。我有像下面这样的制表符分隔文本文件

A1 title body.1 gene
A1 head head.1  head
A1 trunk trunk.1 trunk
A1 tail tail.1 tail
A2 title body.2 gene
A2 head head.2 head
A2 trunk trunk.2 trunk
A2 tail tail.2 tail
A3 title body.3 gene
A3 head head.3 head
A3 trunk trunk.3 trunk
A4 title title.4 gene
A4 trunk trunk.4 trunk
A4 tail tail.4 tail

我想在最后一列中包含单词 "gene" 的每一行之前引入一个新行，如下所示：

A1 title body.1 gene
A1 head head.1  head
A1 trunk trunk.1 trunk
A1 tail tail.1 tail

A2 title body.2 gene
A2 head head.2 head
A2 trunk trunk.2 trunk
A2 tail tail.2 tail

A3 title body.3 gene
A3 head head.3 head
A3 trunk trunk.3 trunk

A4 title title.4 gene
A4 trunk trunk.4 trunk
A4 tail tail.4 tail

我尝试了以下命令

sed 's/gene/\
\n&\g' file.txt

但它在包含单词 "gene" 的行之后引入了一个新行。

如果有人能指导我如何在最后一列中包含单词“gene”的行之前引入新行，那就太好了。

Answer 1

使用反向引用

sed 's/\(^.*gene\)/\n/g' file.txt

Answer 2

只需检查最后一个字段是否为 gene。如果是这样，打印一个空行：

awk '$NF=="gene" {print ""}1' file

这个returns:

$ awk '$NF=="gene" {print ""}1' file

A1 title body.1 gene
A1 head head.1  head
A1 trunk trunk.1 trunk
A1 tail tail.1 tail

A2 title body.2 gene
A2 head head.2 head
A2 trunk trunk.2 trunk
A2 tail tail.2 tail

A3 title body.3 gene
A3 head head.3 head
A3 trunk trunk.3 trunk

A4 title title.4 gene
A4 trunk trunk.4 trunk
A4 tail tail.4 tail

Answer 3

您可能想要这样的东西（扩展的正则表达式语法）：

$ sed -r 's/(^.*?\tgene$)/\n/' example

A1  title   body.1  gene
A1  head    head.1  head
A1  trunk   trunk.1 trunk
A1  tail    tail.1  tail

A2  title   body.2  gene
A2  head    head.2  head
A2  trunk   trunk.2 trunk
A2  tail    tail.2  tail

A3  title   body.3  gene
A3  head    head.3  head
A3  trunk   trunk.3 trunk

A4  title   title.4 gene
A4  trunk   trunk.4 trunk
A4  tail    tail.4  tail

在这个正则表达式中你可以看到：

替换命令's/.../.../'
捕获以制表符和基因结尾的整行的组：(^.*?\tgene$)。
将换行符和先前捕获的组（第一个也是唯一的）插入到结果中：\n

请注意你的问题有一个问题：

I would like introduce a new line before every row containing word "gene" in the last column

这导致假设您需要结果的第一行为空（或者准确地说是一个换行符）

但是您的示例的第一行前面显然没有空行。如果这确实是您所需要的，您应该使用 sed 寻址：

pono@pono-carbon:~$ sed -r '2,$s/(^.*?\tgene$)/\n/' example
A1  title   body.1  gene
A1  head    head.1  head
A1  trunk   trunk.1 trunk
A1  tail    tail.1  tail

A2  title   body.2  gene
A2  head    head.2  head
A2  trunk   trunk.2 trunk
A2  tail    tail.2  tail

A3  title   body.3  gene
A3  head    head.3  head
A3  trunk   trunk.3 trunk

A4  title   title.4 gene
A4  trunk   trunk.4 trunk
A4  tail    tail.4  tail

Answer 4

在 sed 中你可以使用插入命令 i:

sed '2,${/[\t ]gene$/i\

;}' file

2,$ 条件用于防止在开头添加前导换行符。

在 linux 中包含特定单词的每一行之前引入一个新行

Introduce a new line before every row containing a specific word in linux

linux

awk

newline

sed