将 wget 屏幕输出重定向到 bash 中的日志文件

Redirect wget screen output to a log file in bash

首先感谢大家的帮助。我有以下文件,其中包含一系列 URL:

Salmonella_enterica_subsp_enterica_Typhi    https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/717/755/GCF_003717755.1_ASM371775v1/GCF_003717755.1_ASM371775v1_translated_cds.faa.gz
Salmonella_enterica_subsp_enterica_Paratyphi_A  https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/818/115/GCF_000818115.1_ASM81811v1/GCF_000818115.1_ASM81811v1_translated_cds.faa.gz
Salmonella_enterica_subsp_enterica_Paratyphi_B  https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/018/705/GCF_000018705.1_ASM1870v1/GCF_000018705.1_ASM1870v1_translated_cds.faa.gz
Salmonella_enterica_subsp_enterica_Infantis https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/011/182/555/GCA_011182555.2_ASM1118255v2/GCA_011182555.2_ASM1118255v2_translated_cds.faa.gz
Salmonella_enterica_subsp_enterica_Typhimurium_LT2  https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/006/945/GCF_000006945.2_ASM694v2/GCF_000006945.2_ASM694v2_translated_cds.faa.gz
Salmonella_enterica_subsp_diarizonae    https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/324/755/GCF_003324755.1_ASM332475v1/GCF_003324755.1_ASM332475v1_translated_cds.faa.gz
Salmonella_enterica_subsp_arizonae  https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/635/675/GCA_900635675.1_31885_G02/GCA_900635675.1_31885_G02_translated_cds.faa.gz
Salmonella_bongori  https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/006/113/225/GCF_006113225.1_ASM611322v2/GCF_006113225.1_ASM611322v2_translated_cds.faa.gz

而且我必须使用 wget 下载 url 我已经实现了下载 URL 但是 shell 中的典型输出出现:

--2021-04-23 02:49:00--  https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/635/675/GCA_900635675.1_31885_G02/GCA_900635675.1_31885_G02_translated_cds.faa.gz
Reusing existing connection to ftp.ncbi.nlm.nih.gov:443.
HTTP request sent, awaiting response... 200 OK
Length: 1097880 (1,0M) [application/x-gzip]
Saving to: ‘GCA_900635675.1_31885_G02_translated_cds.faa.gz’

GCA_900635675.1_31885_G0 100%[=================================>]   1,05M  2,29MB/s    in 0,5s    

2021-04-23 02:49:01 (2,29 MB/s) - ‘GCA_900635675.1_31885_G02_translated_cds.faa.gz’ saved [1097880/1097880]

我想将该输出重定向到 log file。另外,在下载文件时,我想解压缩它们,因为它们是 .gz 格式的 zip。我的代码如下

cat $ncbi_urls_file | while read line
do
    echo " Downloading fasta files from NCBI..."
    awk '{print }' | wget -i- 
done

可以使用 >> 运算符(用于附加到文件)或 > 运算符(用于截断/覆盖文件)将标准输出重定向到 bash 中的文件.例如

echo hello >> log.txt

会将“hello”附加到 log.txt。如果您仍然希望能够在您的终端中看到输出并且 将其写入日志文件,您可以使用 tee:

echo hello | tee.txt

然而,wget通过标准错误而不是标准输出输出其大部分基本进度信息。这实际上是一种非常普遍的做法。显示进度信息通常涉及特殊字符来覆盖行(例如更新进度条)、更改终端颜色等。终端可以实时明智地处理这些字符,但将它们存储在文件中通常没有多大意义。出于这个原因,这种增量进度输出通常与其他输出分开,存储在日志文件中更明智,以便更容易相应地重定向,因此增量进度信息通常通过标准错误而不是标准输出输出。

但是,您仍然可以将标准错误重定向到日志文件:

wget example.com 2>> log.txt

或使用tee:

wget example.com 2>&1 | tee log.txt

(2>&1 通过标准输出重定向标准错误,然后通过管道传输到 tee)。

wget

wget 确实有允许记录到文件的选项,来自 man wget

Logging and Input File Options

-o logfile
--output-file=logfile
    Log all messages to logfile. The messages are normally reported to standard error. 
-a logfile
--append-output=logfile
    Append to logfile. This is the same as -o, only it appends to logfile instead of overwriting the old log file. If logfile does not exist, a new file is created. 
-d
--debug
    Turn on debug output, meaning various information important to the developers of Wget if it does not work properly. Your system administrator may have chosen to compile Wget without debug support, in which case -d will not work. Please note that compiling with debug support is always safe---Wget compiled with the debug support will not print any debug info unless requested with -d. 
-q
--quiet
    Turn off Wget's output. 
-v
--verbose
    Turn on verbose output, with all the available data. The default output is verbose. 
-nv
--no-verbose
    Turn off verbose without being completely quiet (use -q for that), which means that error messages and basic information still get printed.

如果您需要单个文件中的所有日志使用 -a log.out,您将需要尝试获得所需的内容,这将导致 wget 将日志信息附加到所述文件而不是写入 stderr.