以文件名作为第一列连接多个 txt 文件
Concatenate multiple txt files with a name of file as a first column
我想将多个 .txt 文件连接到一个文件中,并将文件名也作为每个文件前的第一列(以便了解数据来自哪个文件)。我在下面使用的代码执行此操作,但仅针对第一行。
for i in *.txt; do echo -n "$i," && cat "$i"; done > tmpfile; mv tmpfile all-files.txt;
例如这样的输出:
filename1.txt,COVERAGE SUMMARY,,Aligned bases in genome,80754336928,100.00
filename1.txt,COVERAGE SUMMARY,,Average alignment coverage over genome,26.55
filename2.txt,COVERAGE SUMMARY,,Aligned bases in genome,88896465740,100.00
filename2.txt,COVERAGE SUMMARY,,Average alignment coverage over genome,33.40
我建议使用 awk:
for f in *.txt; do awk "{print \"$f, \" $0}" "$f"; done
建议 gawk
命令:
在每个文件的第一行打印文件名后跟 ,
。
gawk 'BEGINFILE{printf("%s,",FILENAME)}1' *.txt
在每个文件的每一行打印文件名后跟 ,
。
awk '{print FILENAME "," [=11=]}' *.txt
3 种不同的变体,sub()
稍微快一点:
代码
{m,g} 'sub("^",FILENAME",")' FS='^$' *.txt
基准测试
( time ( mawk2 '$NF=FILENAME","$NF' FS='^$' "${m3t}" ) | pvE9 >/dev/null)
out9: 2.34GiB 0:00:02 [ 896MiB/s] [ 896MiB/s] [ <=> ]
2.33s user 0.34s system 99% cpu 2.685 total
————————
( time ( mawk2 '$!_=FILENAME","$!_' FS='^$' "${m3t}" ) | pvE9 >/dev/null) | lgp3
out9: 2.34GiB 0:00:02 [ 876MiB/s] [ 876MiB/s] [<=> ]
2.38s user 0.34s system 99% cpu 2.744 total
————————
( time ( mawk2 'sub("^",FILENAME",")' FS='^$' "${m3t}" ) | pvE9 >/dev/null) | lgp3
out9: 2.34GiB 0:00:02 [ 987MiB/s] [ 987MiB/s] [ <=> ]
2.10s user 0.32s system 99% cpu 2.441 total
我想将多个 .txt 文件连接到一个文件中,并将文件名也作为每个文件前的第一列(以便了解数据来自哪个文件)。我在下面使用的代码执行此操作,但仅针对第一行。
for i in *.txt; do echo -n "$i," && cat "$i"; done > tmpfile; mv tmpfile all-files.txt;
例如这样的输出:
filename1.txt,COVERAGE SUMMARY,,Aligned bases in genome,80754336928,100.00
filename1.txt,COVERAGE SUMMARY,,Average alignment coverage over genome,26.55
filename2.txt,COVERAGE SUMMARY,,Aligned bases in genome,88896465740,100.00
filename2.txt,COVERAGE SUMMARY,,Average alignment coverage over genome,33.40
我建议使用 awk:
for f in *.txt; do awk "{print \"$f, \" $0}" "$f"; done
建议 gawk
命令:
在每个文件的第一行打印文件名后跟 ,
。
gawk 'BEGINFILE{printf("%s,",FILENAME)}1' *.txt
在每个文件的每一行打印文件名后跟 ,
。
awk '{print FILENAME "," [=11=]}' *.txt
3 种不同的变体,sub()
稍微快一点:
代码
{m,g} 'sub("^",FILENAME",")' FS='^$' *.txt
基准测试
( time ( mawk2 '$NF=FILENAME","$NF' FS='^$' "${m3t}" ) | pvE9 >/dev/null)
out9: 2.34GiB 0:00:02 [ 896MiB/s] [ 896MiB/s] [ <=> ]
2.33s user 0.34s system 99% cpu 2.685 total
————————
( time ( mawk2 '$!_=FILENAME","$!_' FS='^$' "${m3t}" ) | pvE9 >/dev/null) | lgp3
out9: 2.34GiB 0:00:02 [ 876MiB/s] [ 876MiB/s] [<=> ]
2.38s user 0.34s system 99% cpu 2.744 total
————————
( time ( mawk2 'sub("^",FILENAME",")' FS='^$' "${m3t}" ) | pvE9 >/dev/null) | lgp3
out9: 2.34GiB 0:00:02 [ 987MiB/s] [ 987MiB/s] [ <=> ]
2.10s user 0.32s system 99% cpu 2.441 total