双新行作为 awk 和 tail 函数中的分隔符

Question

我需要分离一个如下所示的文本文件

1
00:01:03:321 --> 00:01:04:321
 Randomtext1

2
00:02:03:321 --> 00:03:04:321
 Randomtext2
Still random text2
3rd line of randomtext2

3
00:04:03:321 --> 00:05:04:321
 Randomtext3
Stillrand

块包括序号行、定时器行、内容行然后删除序号行。通过将代码分成这样的块，我的意思是我想将所有这些行作为 1 条记录（因此删除前两行（纯新行和序号行）可能更容易）这是代码：

#!/bin/bash
name=text.sub
name2=text2.sub
awk '
BEGIN {FS="\n\n";

}
{ 
tail -n+1 ;

}' $name > $name2

预期输出将是

00:01:03:321 --> 00:01:04:321
 Randomtext1
00:02:03:321 --> 00:03:04:321
 Randomtext2
Still random text2
3rd line of randomtext2
00:04:03:321 --> 00:05:04:321
 Randomtext3
Stillrand

Answer 1

您可以使用此 awk 脚本执行此操作：

script.awk

BEGIN { FS = "\n"
        RS = "\n\n" 
      }

      { print 
        print  
      }

运行像这样：awk -f script.awk text.sub > text2.sub

通过将字段分隔符 FS 定义为换行符并将记录分隔符 RS 定义为双换行符，我们将字段转换为通常的 , , .

Answer 2

我会像这样对待每个部分：

1
00:01:03:321 --> 00:01:04:321
 Randomtext1

作为个人记录。

您可以使用记录和字段分隔符来实现这一点，如下所示：

awk '{=""}1' RS='' FS='\n' OFS='\n' file

RS='' 是输入记录分隔符。 ''有特殊的含义，表示\n\n
FS='\n' 设置输入域分隔符为换行符
OFS=\n` 将输出字段分隔符设置为换行符

程序 {=""} 擦除第一个字段（数字）并 1 打印记录。

Answer 3

输入

1
00:01:03:321 --> 00:01:04:321
 Randomtext1

2
00:02:03:321 --> 00:03:04:321
 Randomtext2

脚本

 awk 'BEGIN{RS="";FS="\n"}{printf "%s\n%s\n",,}' file

输出

00:01:03:321 --> 00:01:04:321
 Randomtext1
00:02:03:321 --> 00:03:04:321
 Randomtext2

Answer 4

$ awk 'NR%4~/^[23]$/' file
00:01:03:321 --> 00:01:04:321
 Randomtext1
00:02:03:321 --> 00:03:04:321
 Randomtext2

如果这不是您想要的，请编辑您的问题以提供更真实的样本 input/output。

Answer 5

这个怎么样

$ sed -n '2~4p;3~4p' file

00:01:03:321 --> 00:01:04:321
 Randomtext1
00:02:03:321 --> 00:03:04:321
 Randomtext2

从第 2 行和第 3 行开始每 4 行打印一次。

Answer 6

我不确定你到底想做什么，但根据你想要的输出，这个命令产生相同的结果：

awk '!/^[0-9]*$/' text.sub

双新行作为 awk 和 tail 函数中的分隔符

Double new line as delimiter in awk and tail function

bash

awk

tail