如何从BeautifulSoup下载图片?
How to download images from BeautifulSoup?
图片http://i.imgur.com/OigSBjF.png
导入请求
从 bs4 导入 BeautifulSoup</p>
<pre>r = requests.get("xxxxxxxxx")
soup = BeautifulSoup(r.content)
for link in links:
if "http" in link.get('src'):
print link.get('src')
我得到打印出来的 URL 但不知道如何使用它。
您需要下载并写入磁盘:
import requests
from os.path import basename
r = requests.get("xxx")
soup = BeautifulSoup(r.content)
for link in links:
if "http" in link.get('src'):
lnk = link.get('src')
with open(basename(lnk), "wb") as f:
f.write(requests.get(lnk).content)
您还可以使用 select 来过滤您的标签,只获取带有 http 链接的标签:
for link in soup.select("img[src^=http]"):
lnk = link["src"]
with open(basename(lnk)," wb") as f:
f.write(requests.get(lnk).content)
虽然其他答案完全正确。
我发现下载真的很慢,而且不知道高分辨率图片的进度。
所以,做了这个。
from bs4 import BeautifulSoup
import requests
import subprocess
url = "https://example.site/page/with/images"
html = requests.get(url).text # get the html
soup = BeautifulSoup(html, "lxml") # give the html to soup
# get all the anchor links with the custom class
# the element or the class name will change based on your case
imgs = soup.findAll("a", {"class": "envira-gallery-link"})
for img in imgs:
imgUrl = img['href'] # get the href from the tag
cmd = [ 'wget', imgUrl ] # just download it using wget.
subprocess.Popen(cmd) # run the command to download
# if you don't want to run it parallel;
# and wait for each image to download just add communicate
subprocess.Popen(cmd).communicate()
警告:它不适用于 win/mac,因为它使用 wget。
奖励:如果您没有使用通信,您可以看到每个图像的进度。
图片http://i.imgur.com/OigSBjF.png
导入请求
从 bs4 导入 BeautifulSoup</p>
<pre>r = requests.get("xxxxxxxxx")
soup = BeautifulSoup(r.content)
for link in links:
if "http" in link.get('src'):
print link.get('src')
我得到打印出来的 URL 但不知道如何使用它。
您需要下载并写入磁盘:
import requests
from os.path import basename
r = requests.get("xxx")
soup = BeautifulSoup(r.content)
for link in links:
if "http" in link.get('src'):
lnk = link.get('src')
with open(basename(lnk), "wb") as f:
f.write(requests.get(lnk).content)
您还可以使用 select 来过滤您的标签,只获取带有 http 链接的标签:
for link in soup.select("img[src^=http]"):
lnk = link["src"]
with open(basename(lnk)," wb") as f:
f.write(requests.get(lnk).content)
虽然其他答案完全正确。
我发现下载真的很慢,而且不知道高分辨率图片的进度。
所以,做了这个。
from bs4 import BeautifulSoup
import requests
import subprocess
url = "https://example.site/page/with/images"
html = requests.get(url).text # get the html
soup = BeautifulSoup(html, "lxml") # give the html to soup
# get all the anchor links with the custom class
# the element or the class name will change based on your case
imgs = soup.findAll("a", {"class": "envira-gallery-link"})
for img in imgs:
imgUrl = img['href'] # get the href from the tag
cmd = [ 'wget', imgUrl ] # just download it using wget.
subprocess.Popen(cmd) # run the command to download
# if you don't want to run it parallel;
# and wait for each image to download just add communicate
subprocess.Popen(cmd).communicate()
警告:它不适用于 win/mac,因为它使用 wget。
奖励:如果您没有使用通信,您可以看到每个图像的进度。