在 bash 中多次读取 txt 文件（并行处理）

Question

这是一个简单的bash HTTP 状态码脚本

while read url
    do
        urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
        echo "$url  $urlstatus" >> urlstatus.txt
    done <

我正在从文本文件中读取 URL 但它一次只处理一个，花费太多时间，GNU parallel 和 xargs 也一次处理一行（已测试）

如何同时处理 URL 处理以改善时序？换句话说 URL 文件而不是 bash 命令的线程（GNU parallel 和 xargs 做的）

 Input file is txt file and lines are separated  as
    ABC.Com
    Bcd.Com
    Any.Google.Com

Something  like this

Answer 1

GNU parallel and xargs also process one line at time (tested)

你能举个例子吗？如果您使用 -j 那么您应该能够运行一次处理多个进程。

我会这样写：

doit() {
    url=""
    urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
    echo "$url  $urlstatus"
}
export -f doit
cat "" | parallel -j0 -k doit >> urlstatus.txt

基于输入：

Input file is txt file and lines are separated  as
ABC.Com
Bcd.Com
Any.Google.Com
Something  like this
www.google.com
pi.dk

我得到输出：

Input file is txt file and lines are separated  as  000
ABC.Com  301
Bcd.Com  301
Any.Google.Com  000
Something  like this  000
www.google.com  302
pi.dk  200

哪个看起来正确：

000 if domain does not exist
301/302 for redirection
200 for success

在 bash 中多次读取 txt 文件（并行处理）

Multiple read from a txt file in bash (parallel processing )

bash

curl

xargs

libcurl

gnu-parallel