以文件名作为第一列连接多个 txt 文件

Concatenate multiple txt files with a name of file as a first column

我想将多个 .txt 文件连接到一个文件中,并将文件名也作为每个文件前的第一列(以便了解数据来自哪个文件)。我在下面使用的代码执行此操作,但仅针对第一行。

for i in *.txt; do echo -n "$i," && cat "$i"; done > tmpfile; mv tmpfile all-files.txt;

例如这样的输出:

filename1.txt,COVERAGE SUMMARY,,Aligned bases in genome,80754336928,100.00
filename1.txt,COVERAGE SUMMARY,,Average alignment coverage over genome,26.55
filename2.txt,COVERAGE SUMMARY,,Aligned bases in genome,88896465740,100.00
filename2.txt,COVERAGE SUMMARY,,Average alignment coverage over genome,33.40

我建议使用 awk:

for f in *.txt; do awk "{print \"$f, \" $0}" "$f"; done

建议 gawk 命令:

在每个文件的第一行打印文件名后跟 ,

gawk 'BEGINFILE{printf("%s,",FILENAME)}1' *.txt

在每个文件的每一行打印文件名后跟 ,

awk '{print FILENAME "," [=11=]}' *.txt

3 种不同的变体,sub() 稍微快一点:

代码

{m,g} 'sub("^",FILENAME",")' FS='^$' *.txt

基准测试

 ( time ( mawk2  '$NF=FILENAME","$NF' FS='^$' "${m3t}" )  | pvE9 >/dev/null)

 out9: 2.34GiB 0:00:02 [ 896MiB/s] [ 896MiB/s] [ <=> ]

 2.33s user 0.34s system 99% cpu 2.685 total

————————

 ( time ( mawk2  '$!_=FILENAME","$!_' FS='^$' "${m3t}" )  | pvE9 >/dev/null) | lgp3 

 out9: 2.34GiB 0:00:02 [ 876MiB/s] [ 876MiB/s] [<=> ]

 2.38s user 0.34s system 99% cpu 2.744 total

————————

( time ( mawk2  'sub("^",FILENAME",")' FS='^$' "${m3t}" )  | pvE9 >/dev/null) | lgp3 
 
 out9: 2.34GiB 0:00:02 [ 987MiB/s] [ 987MiB/s] [ <=> ]

 2.10s user 0.32s system 99% cpu 2.441 total