根据 AWK 中的变量创建一个 csv 文件
Create a csv file based on variables in AWK
这对某些人来说似乎相对简单,但就我而言,我已经花了很多时间,但它不起作用。我想要做的是创建一个由逗号分隔的 csv 文件,使用提供的列表中的 fastq 名称作为信息 fastq_1 M1,fastq_2 M2 和变量。 csv header 的名称应如下所示示例、fastq_1、fastq_2、strandedness 并且每个变量和名称必须在 header 的同一列中匹配。
fastq folder
S1_1.fastq.gz
S1_2.fastq.gz
S2_1.fastq.gz
S2_2.fastq.gz
S3_1.fastq.gz
S3_2.fastq.gz
S4_1.fastq.gz
S4_2.fastq.gz
# variables
sample="mouse"
M1=$(ls *_1.fastq.gz)
M2=$(ls *_2.fastq.gz)
strandedness="paired"
#code
awk '
BEGIN { OFS=",";
print "sample", "fastq_1", "fastq_2", "strandedness"
}
FNR==NR {
print $sample, $M1, $M2, $strandedness
}' > output.csv
期望的输出
sample, fastq_1, fastq_2, strandedness #header
mouse, S1_1.fastq.gz, S1_2.fastq.gz, paired #values
mouse, S2_1.fastq.gz, S2_2.fastq.gz, paired #values
mouse, S3_1.fastq.gz, S3_2.fastq.gz, paired #values
mouse, S4_1.fastq.gz, S4_2.fastq.gz, paired #values
如果有人能帮我解决这个问题,我会很高兴
纯 bash 可能比 awk 更容易:
#!/bin/bash
sample=mouse
strandedness=paired
fastq_folder=./
{
# header
printf '%s, %s, %s, %s\n' sample fastq_1 fastq_2 strandedness
# values
for fastq_1 in "$fastq_folder"/*_1.fastq.gz
do
fastq_2="${fastq_1%_1.fastq.gz}_2.fastq.gz"
[[ -f $fastq_2 ]] || continue # you may display an error message
printf '%s, %s, %s, %s\n' \
"$sample" \
"${fastq_1##*/}" \
"${fastq_2##*/}" \
"$strandedness"
done
} > output.csv
output.csv:
sample, fastq_1, fastq_2, strandedness
mouse, S1_1.fastq.gz, S1_2.fastq.gz, paired
mouse, S2_1.fastq.gz, S2_2.fastq.gz, paired
mouse, S3_1.fastq.gz, S3_2.fastq.gz, paired
mouse, S4_1.fastq.gz, S4_2.fastq.gz, paired
备注:在逗号后添加一个space可能看起来更漂亮,但在CSV术语中,这样做是在数据中添加一个space字符.
$ ls fastq_folder
S1_1.fastq.gz S2_1.fastq.gz S3_1.fastq.gz S4_1.fastq.gz
S1_2.fastq.gz S2_2.fastq.gz S3_2.fastq.gz S4_2.fastq.gz
$ cat tst.awk
BEGIN {
OFS=","
print "sample", "fastq_1", "fastq_2", "strandedness"
for (i=1; i<ARGC; i++) {
sub(".*/","",ARGV[i])
file1 = file2 = ARGV[i]
sub(/_1/,"_2",file2)
print sample, file1, file2, strandedness
}
exit
}
$ awk -v sample="$sample" -v strandedness="$strandedness" -f tst.awk fastq_folder/*_1.fastq.gz
sample,fastq_1,fastq_2,strandedness
mouse,S1_1.fastq.gz,S1_2.fastq.gz,paired
mouse,S2_1.fastq.gz,S2_2.fastq.gz,paired
mouse,S3_1.fastq.gz,S3_2.fastq.gz,paired
mouse,S4_1.fastq.gz,S4_2.fastq.gz,paired
以上假定文件始终如您在评论中所述配对,并且没有太多文件超过 shell 的 ARGS_MAX。
这对某些人来说似乎相对简单,但就我而言,我已经花了很多时间,但它不起作用。我想要做的是创建一个由逗号分隔的 csv 文件,使用提供的列表中的 fastq 名称作为信息 fastq_1 M1,fastq_2 M2 和变量。 csv header 的名称应如下所示示例、fastq_1、fastq_2、strandedness 并且每个变量和名称必须在 header 的同一列中匹配。
fastq folder
S1_1.fastq.gz
S1_2.fastq.gz
S2_1.fastq.gz
S2_2.fastq.gz
S3_1.fastq.gz
S3_2.fastq.gz
S4_1.fastq.gz
S4_2.fastq.gz
# variables
sample="mouse"
M1=$(ls *_1.fastq.gz)
M2=$(ls *_2.fastq.gz)
strandedness="paired"
#code
awk '
BEGIN { OFS=",";
print "sample", "fastq_1", "fastq_2", "strandedness"
}
FNR==NR {
print $sample, $M1, $M2, $strandedness
}' > output.csv
期望的输出
sample, fastq_1, fastq_2, strandedness #header
mouse, S1_1.fastq.gz, S1_2.fastq.gz, paired #values
mouse, S2_1.fastq.gz, S2_2.fastq.gz, paired #values
mouse, S3_1.fastq.gz, S3_2.fastq.gz, paired #values
mouse, S4_1.fastq.gz, S4_2.fastq.gz, paired #values
如果有人能帮我解决这个问题,我会很高兴
纯 bash 可能比 awk 更容易:
#!/bin/bash
sample=mouse
strandedness=paired
fastq_folder=./
{
# header
printf '%s, %s, %s, %s\n' sample fastq_1 fastq_2 strandedness
# values
for fastq_1 in "$fastq_folder"/*_1.fastq.gz
do
fastq_2="${fastq_1%_1.fastq.gz}_2.fastq.gz"
[[ -f $fastq_2 ]] || continue # you may display an error message
printf '%s, %s, %s, %s\n' \
"$sample" \
"${fastq_1##*/}" \
"${fastq_2##*/}" \
"$strandedness"
done
} > output.csv
output.csv:
sample, fastq_1, fastq_2, strandedness
mouse, S1_1.fastq.gz, S1_2.fastq.gz, paired
mouse, S2_1.fastq.gz, S2_2.fastq.gz, paired
mouse, S3_1.fastq.gz, S3_2.fastq.gz, paired
mouse, S4_1.fastq.gz, S4_2.fastq.gz, paired
备注:在逗号后添加一个space可能看起来更漂亮,但在CSV术语中,这样做是在数据中添加一个space字符.
$ ls fastq_folder
S1_1.fastq.gz S2_1.fastq.gz S3_1.fastq.gz S4_1.fastq.gz
S1_2.fastq.gz S2_2.fastq.gz S3_2.fastq.gz S4_2.fastq.gz
$ cat tst.awk
BEGIN {
OFS=","
print "sample", "fastq_1", "fastq_2", "strandedness"
for (i=1; i<ARGC; i++) {
sub(".*/","",ARGV[i])
file1 = file2 = ARGV[i]
sub(/_1/,"_2",file2)
print sample, file1, file2, strandedness
}
exit
}
$ awk -v sample="$sample" -v strandedness="$strandedness" -f tst.awk fastq_folder/*_1.fastq.gz
sample,fastq_1,fastq_2,strandedness
mouse,S1_1.fastq.gz,S1_2.fastq.gz,paired
mouse,S2_1.fastq.gz,S2_2.fastq.gz,paired
mouse,S3_1.fastq.gz,S3_2.fastq.gz,paired
mouse,S4_1.fastq.gz,S4_2.fastq.gz,paired
以上假定文件始终如您在评论中所述配对,并且没有太多文件超过 shell 的 ARGS_MAX。