有没有一种方法可以使用这样的代码从任何搜索引擎下载图像?
Is there a way I can Download images from any search engine with a code like this?
我尝试将图像从 bing 下载到一个目录,但由于某种原因,代码只是执行但什么也没给我。甚至没有错误。我也使用了用户代理 HTTP。但它似乎仍然没有用..我该怎么办?
from bs4 import BeautifulSoup
import requests
from PIL import Image
from io import BytesIO
url = 'https://www.bing.com/search'
search = input("Search for: ")
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101
Firefox/80.0'}
params = {"q": search}
r = requests.get(url, headers=headers, params=params)
soup = BeautifulSoup(r.text, "html.parser")
links = soup.findAll("a", {"class": "thumb"})
for item in links:
img_obj = requests.get(item.attrs["href"])
print("Getting", item.attrs["href"])
title = item.attrs["href"].split("/")[-1]
img = Image.open(BytesIO(img_obj.content))
img.save("./scraped_images/" + title, img.format)
要获取所有图片,您需要将/images
添加到link。以下是对您的代码进行修改的示例:
from bs4 import BeautifulSoup
from PIL import Image
from io import BytesIO
import requests
import json
search = input("Search for: ")
url = "https://www.bing.com/images/search"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0"
}
params = {"q": search, "form": "HDRSC2", "first": "1", "scenario": "ImageBasicHover"}
r = requests.get(url, headers=headers, params=params)
soup = BeautifulSoup(r.text, "html.parser")
links = soup.find_all("div", {"class": "img_cont hoff"})
for data in soup.find_all("a", {"class": "iusc"}):
json_data = json.loads(data["m"])
img_link = json_data["murl"]
img_object = requests.get(img_link, headers=headers)
title = img_link.split("/")[-1]
print("Getting: ", img_link)
print("Title: ", title + "\n")
img = Image.open(BytesIO(img_object.content))
img.save("./scraped_images/" + title)
我尝试将图像从 bing 下载到一个目录,但由于某种原因,代码只是执行但什么也没给我。甚至没有错误。我也使用了用户代理 HTTP。但它似乎仍然没有用..我该怎么办?
from bs4 import BeautifulSoup
import requests
from PIL import Image
from io import BytesIO
url = 'https://www.bing.com/search'
search = input("Search for: ")
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101
Firefox/80.0'}
params = {"q": search}
r = requests.get(url, headers=headers, params=params)
soup = BeautifulSoup(r.text, "html.parser")
links = soup.findAll("a", {"class": "thumb"})
for item in links:
img_obj = requests.get(item.attrs["href"])
print("Getting", item.attrs["href"])
title = item.attrs["href"].split("/")[-1]
img = Image.open(BytesIO(img_obj.content))
img.save("./scraped_images/" + title, img.format)
要获取所有图片,您需要将/images
添加到link。以下是对您的代码进行修改的示例:
from bs4 import BeautifulSoup
from PIL import Image
from io import BytesIO
import requests
import json
search = input("Search for: ")
url = "https://www.bing.com/images/search"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0"
}
params = {"q": search, "form": "HDRSC2", "first": "1", "scenario": "ImageBasicHover"}
r = requests.get(url, headers=headers, params=params)
soup = BeautifulSoup(r.text, "html.parser")
links = soup.find_all("div", {"class": "img_cont hoff"})
for data in soup.find_all("a", {"class": "iusc"}):
json_data = json.loads(data["m"])
img_link = json_data["murl"]
img_object = requests.get(img_link, headers=headers)
title = img_link.split("/")[-1]
print("Getting: ", img_link)
print("Title: ", title + "\n")
img = Image.open(BytesIO(img_object.content))
img.save("./scraped_images/" + title)