如何从网络抓取中将图像保存到文件夹? (Python)
How to save images to a folder from web scraping? (Python)
如何将我从网络抓取中获得的每张图像存储到一个文件夹中?我目前使用 Google Colab,因为我只是在练习东西。我想将它们存储在我的 Google 云端硬盘文件夹中。
这是我的网页抓取代码:
import requests
from bs4 import BeautifulSoup
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata('https://www.yahoo.com/')
soup = BeautifulSoup(htmldata, 'html.parser')
imgdata = []
for i in soup.find_all('img'):
imgdata = i['src']
print(imgdata)
我在脚本所在的文件夹中手动创建了一个图片文件夹运行将图片存储在其中。比起我在 for 循环中更改您的代码,以便将 urls 附加到 imgdata
列表。 try except
块之所以存在,是因为并非列表中的每个 url 都有效。
import requests
from bs4 import BeautifulSoup
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata('https://www.yahoo.com/')
soup = BeautifulSoup(htmldata, 'html.parser')
imgdata = []
for i in soup.find_all('img'):
imgdata.append(i['src']) # made a change here so its appendig to the list
filename = "pics/picture{}.jpg"
for i in range(len(imgdata)):
print(f"img {i+1} / {len(imgdata)+1}")
# try block because not everything in the imgdata list is a valid url
try:
r = requests.get(imgdata[i], stream=True)
with open(filename.format(i), "wb") as f:
f.write(r.content)
except:
print("Url is not an valid")
foo.write('whatever')
foo.close()
如何将我从网络抓取中获得的每张图像存储到一个文件夹中?我目前使用 Google Colab,因为我只是在练习东西。我想将它们存储在我的 Google 云端硬盘文件夹中。
这是我的网页抓取代码:
import requests
from bs4 import BeautifulSoup
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata('https://www.yahoo.com/')
soup = BeautifulSoup(htmldata, 'html.parser')
imgdata = []
for i in soup.find_all('img'):
imgdata = i['src']
print(imgdata)
我在脚本所在的文件夹中手动创建了一个图片文件夹运行将图片存储在其中。比起我在 for 循环中更改您的代码,以便将 urls 附加到 imgdata
列表。 try except
块之所以存在,是因为并非列表中的每个 url 都有效。
import requests
from bs4 import BeautifulSoup
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata('https://www.yahoo.com/')
soup = BeautifulSoup(htmldata, 'html.parser')
imgdata = []
for i in soup.find_all('img'):
imgdata.append(i['src']) # made a change here so its appendig to the list
filename = "pics/picture{}.jpg"
for i in range(len(imgdata)):
print(f"img {i+1} / {len(imgdata)+1}")
# try block because not everything in the imgdata list is a valid url
try:
r = requests.get(imgdata[i], stream=True)
with open(filename.format(i), "wb") as f:
f.write(r.content)
except:
print("Url is not an valid")
foo.write('whatever')
foo.close()