如何在 un-formatted 数据集中添加具有相邻内容的 header 文本，使用 sed/awk/python 与分隔符分隔值并排

Question

我有一长串未格式化的数据，例如 data.txt，其中每组数据以 header 开头，以空行结尾，例如：

TypeA/Price:20$
alexmob
moblexto
unkntom

TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret

TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox

现在，我想将每个集合的 header 与其内容并排添加，并用逗号分隔。喜欢：

alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

这样，每当我使用一个关键字进行 grep 时，相关内容就会与 header 一起出现。喜欢：

$grep mob data.txt
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
moblexto2,TypeB/Price:25$
alexmob3,TypeB/Price:25$
mobryhox,TypeC/Price:30$

我是 bash 脚本和 python 的新手，最近开始学习这些，所以非常感谢任何简单的 bash 脚本（使用 sed/awk）或python 脚本。

Answer 1

使用sed

$ sed '/Type/{h;d;};/[a-z]/{G;s/\n/,/}' input_file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

匹配包含 Type 的行，将其保存在内存中并删除它。

用字母字符匹配行，追加G保留space的内容。最后，为逗号换行。

Answer 2

我会按照以下方式使用 GNU AWK 完成此任务，令 file.txt 内容为

TypeA/Price:20$
alexmob
moblexto
unkntom

TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret

TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox

然后

awk '/^Type/{header=[=11=];next}{print /./?[=11=] ";" header:[=11=]}' file.txt

产出

alexmob;TypeA/Price:20$
moblexto;TypeA/Price:20$
unkntom;TypeA/Price:20$

moblexto2;TypeB/Price:25$
unkntom0;TypeB/Price:25$
alexmob3;TypeB/Price:25$
poptop9;TypeB/Price:25$
tyloret;TypeB/Price:25$

rtyuoper0;TypeC/Price:30$
kunlohpe6;TypeC/Price:30$
mobryhox;TypeC/Price:30$

说明：如果行以 (^) Type 开始，则将 header 值设置为该行 ([=18=]) 并转到 next 行。对于每一行 print 如果它确实包含至少一个字符 (/./) 行 ([=18=]) 与 ; 和 header 连接，否则打印行 ([ =18=]) 原样。

（在 GNU Awk 5.0.1 中测试）

Answer 3

在每个 Unix 机器上使用任何 shell 中的任何 awk，无论数据中包含哪些字符：

$ awk -v RS= -F'\n' -v OFS=',' '{for (i=2;i<=NF;i++) print $i, ; print ""}' file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

如何在 un-formatted 数据集中添加具有相邻内容的 header 文本，使用 sed/awk/python 与分隔符分隔值并排

how to add header text with adjacent content in un-formatted data set, side by side with a delimiter separated value using sed/awk/python

python

bash

shell

awk

sed