wget - 如何跳过未找到的文件?
wget - How to skip not found file?
我使用 wget
从 Internet 下载文件并使用 -O
选项以自定义文件名保存图像。有时,找不到文件,返回 404 错误代码。比如我运行这个命令:
wget 'http://www.example.com/path/to/image/file01928.jpg' -O myimagefile.jpg
结果是
root@localhost:~# wget 'http://www.example.com/path/to/image/file01928.jpg' -O myimagefile.jpg
--2015-09-13 23:11:07-- http://www.example.com/path/to/image/file01928.jpg
Resolving www.example.com (www.example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to www.example.com (www.example.com)|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2015-09-13 23:11:07 ERROR 404: Not Found.
虽然没有找到文件,但文件仍然保存在我的硬盘中:
root@localhost:~# ls
myimagefile.jpg
有没有办法跳过/取消(不执行命令)找不到文件?我应该使用什么选项?
您可以执行 HEAD 请求以查看资源(图像)是否存在,如果存在,则下载它。您可以 运行 wget with -S 来打印 headers 和 --spider 来检查,但不下载资源。
来自man wget
-S
--server-response
Print the headers sent by HTTP servers and responses sent by FTP servers.
--spider
When invoked with this option, Wget will behave as a Web spider, which means that
it will not download the pages, just check that they are there. For example, you
can use Wget to check your bookmarks:
wget --spider --force-html -i bookmarks.html
This feature needs much more work for Wget to get close to the functionality of
real web spiders.
这是一个例子:
#!/bin/bash
URL='http://www.google.com'
echo "Checking $URL"
if wget -S --spider $URL 2>&1 | grep -q 'Remote file exists'; then
echo "Found $URL, going to fetch it"
wget $URL -O google.html;
else
echo 'Url $URL does not exist!'
fi
URL='http://www.example.com/path/to/image/file01928.jpg'
echo "Checking $URL"
if wget -S --spider $URL 2>&1 | grep -q 'Remote file exists'; then
echo "Found $URL, going to fetch it"
wget $URL -O myimagefile.jpg;
else
echo "Url $URL does not exist!"
fi
输出
Checking http://www.google.com
Found http://www.google.com, going to fetch it
--2015-09-14 05:26:34-- http://www.google.com/
Resolving www.google.com (www.google.com)... 74.125.239.144, 74.125.239.145, 74.125.239.146, ...
Connecting to www.google.com (www.google.com)|74.125.239.144|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘google.html’
[ <=> ] 18,684 --.-K/s in 0.001s
2015-09-14 05:26:34 (13.9 MB/s) - ‘google.html’ saved [18684]
Checking http://www.example.com/path/to/image/file01928.jpg
Url http://www.example.com/path/to/image/file01928.jpg does not exist!
我使用 wget
从 Internet 下载文件并使用 -O
选项以自定义文件名保存图像。有时,找不到文件,返回 404 错误代码。比如我运行这个命令:
wget 'http://www.example.com/path/to/image/file01928.jpg' -O myimagefile.jpg
结果是
root@localhost:~# wget 'http://www.example.com/path/to/image/file01928.jpg' -O myimagefile.jpg
--2015-09-13 23:11:07-- http://www.example.com/path/to/image/file01928.jpg
Resolving www.example.com (www.example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to www.example.com (www.example.com)|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2015-09-13 23:11:07 ERROR 404: Not Found.
虽然没有找到文件,但文件仍然保存在我的硬盘中:
root@localhost:~# ls
myimagefile.jpg
有没有办法跳过/取消(不执行命令)找不到文件?我应该使用什么选项?
您可以执行 HEAD 请求以查看资源(图像)是否存在,如果存在,则下载它。您可以 运行 wget with -S 来打印 headers 和 --spider 来检查,但不下载资源。
来自man wget
-S --server-response Print the headers sent by HTTP servers and responses sent by FTP servers. --spider When invoked with this option, Wget will behave as a Web spider, which means that it will not download the pages, just check that they are there. For example, you can use Wget to check your bookmarks: wget --spider --force-html -i bookmarks.html This feature needs much more work for Wget to get close to the functionality of real web spiders.
这是一个例子:
#!/bin/bash
URL='http://www.google.com'
echo "Checking $URL"
if wget -S --spider $URL 2>&1 | grep -q 'Remote file exists'; then
echo "Found $URL, going to fetch it"
wget $URL -O google.html;
else
echo 'Url $URL does not exist!'
fi
URL='http://www.example.com/path/to/image/file01928.jpg'
echo "Checking $URL"
if wget -S --spider $URL 2>&1 | grep -q 'Remote file exists'; then
echo "Found $URL, going to fetch it"
wget $URL -O myimagefile.jpg;
else
echo "Url $URL does not exist!"
fi
输出
Checking http://www.google.com
Found http://www.google.com, going to fetch it
--2015-09-14 05:26:34-- http://www.google.com/
Resolving www.google.com (www.google.com)... 74.125.239.144, 74.125.239.145, 74.125.239.146, ...
Connecting to www.google.com (www.google.com)|74.125.239.144|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘google.html’
[ <=> ] 18,684 --.-K/s in 0.001s
2015-09-14 05:26:34 (13.9 MB/s) - ‘google.html’ saved [18684]
Checking http://www.example.com/path/to/image/file01928.jpg
Url http://www.example.com/path/to/image/file01928.jpg does not exist!