为什么我不能使用 python 从 google 下载图像?
why couldn't I download images from google with python?
该代码帮助我从 google 下载了一堆图像。它曾经在几天前工作,现在突然间代码中断了。
代码:
# importing google_images_download module
from google_images_download import google_images_download
# creating object
response = google_images_download.googleimagesdownload()
search_queries = ['Apple', 'Orange', 'Grapes', 'water melon']
def downloadimages(query):
# keywords is the search query
# format is the image file format
# limit is the number of images to be downloaded
# print urs is to print the image file url
# size is the image size which can
# be specified manually ("large, medium, icon")
# aspect ratio denotes the height width ratio
# of images to download. ("tall, square, wide, panoramic")
arguments = {"keywords": query,
"format": "jpg",
"limit":4,
"print_urls":True,
"size": "medium",
"aspect_ratio": "panoramic"}
try:
response.download(arguments)
# Handling File NotFound Error
except FileNotFoundError:
arguments = {"keywords": query,
"format": "jpg",
"limit":4,
"print_urls":True,
"size": "medium"}
# Providing arguments for the searched query
try:
# Downloading the photos based
# on the given arguments
response.download(arguments)
except:
pass
# Driver Code
for query in search_queries:
downloadimages(query)
print()
输出日志:
Item no.: 1 --> Item name = Apple Evaluating... Starting Download...
Unfortunately all 4 could not be downloaded because some images were
not downloadable. 0 is all we got for this search filter!
Errors: 0
Item no.: 1 --> Item name = Orange Evaluating... Starting Download...
Unfortunately all 4 could not be downloaded because some images were
not downloadable. 0 is all we got for this search filter!
Errors: 0
Item no.: 1 --> Item name = Grapes Evaluating... Starting Download...
Unfortunately all 4 could not be downloaded because some images were
not downloadable. 0 is all we got for this search filter!
Errors: 0
Item no.: 1 --> Item name = water melon Evaluating... Starting
Download...
Unfortunately all 4 could not be downloaded because some images were
not downloadable. 0 is all we got for this search filter!
Errors: 0
这实际上创建了一个文件夹,但里面没有图像。
我认为 Google 正在改变 DOM。元素 class="rg_meta notranslate" 已不存在。改为class="rg_i ..."
def get_soup(url,header):
return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)),'html.parser')
def main(args):
query = "typical face"
query = query.split()
query = '+'.join(query)
url = "https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
soup = get_soup(url, headers)
for a in soup.find_all("img", {"class": "rg_i"}):
wget.download(a.attrs["data-iurl"], a.attrs["data-iid"])
if __name__ == '__main__':
from sys import argv
try:
main(argv)
except KeyboardInterrupt:
pass
sys.exit()
这个问题确实是前不久出现的,已经有一堆类似的Github问题:
- https://github.com/hardikvasa/google-images-download/pull/298
- https://github.com/hardikvasa/google-images-download/issues/301
- https://github.com/hardikvasa/google-images-download/issues/302
不幸的是,没有正式的解决方案,目前,您可以使用讨论中提供的临时解决方案。
这不起作用的原因是 google 改变了他们做所有事情的方式,因此您现在需要在搜索字符串中包含 api_key。因此,即使您使用 2.8.0 版本,google-images-download 等软件包也不再有效,因为它们没有占位符来插入您必须使用 [= 注册的 api_key 字符串28=] 以获得每天 2500 次免费下载。
如果您愿意每月支付 50 美元或更多费用来访问 serpapi.com 的服务,一种方法是使用 pip 包 google-search-results
并提供您的 api_key 作为查询参数的一部分。
params = {
"engine" : "google",
...
"api_key" : "secret_api_key"
}
您自己提供 API 密钥,然后调用:
client = GoogleSearchResults(params)
results = client.get_dict()
这个 returns 一个带有 link 的 JSON 字符串到所有图像 url,然后你直接下载它们。
google_images_download 项目似乎不再与 Google API 兼容。
作为替代方案,您可以尝试 simple_image_download。
该代码帮助我从 google 下载了一堆图像。它曾经在几天前工作,现在突然间代码中断了。
代码:
# importing google_images_download module
from google_images_download import google_images_download
# creating object
response = google_images_download.googleimagesdownload()
search_queries = ['Apple', 'Orange', 'Grapes', 'water melon']
def downloadimages(query):
# keywords is the search query
# format is the image file format
# limit is the number of images to be downloaded
# print urs is to print the image file url
# size is the image size which can
# be specified manually ("large, medium, icon")
# aspect ratio denotes the height width ratio
# of images to download. ("tall, square, wide, panoramic")
arguments = {"keywords": query,
"format": "jpg",
"limit":4,
"print_urls":True,
"size": "medium",
"aspect_ratio": "panoramic"}
try:
response.download(arguments)
# Handling File NotFound Error
except FileNotFoundError:
arguments = {"keywords": query,
"format": "jpg",
"limit":4,
"print_urls":True,
"size": "medium"}
# Providing arguments for the searched query
try:
# Downloading the photos based
# on the given arguments
response.download(arguments)
except:
pass
# Driver Code
for query in search_queries:
downloadimages(query)
print()
输出日志:
Item no.: 1 --> Item name = Apple Evaluating... Starting Download...
Unfortunately all 4 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
Errors: 0
Item no.: 1 --> Item name = Orange Evaluating... Starting Download...
Unfortunately all 4 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
Errors: 0
Item no.: 1 --> Item name = Grapes Evaluating... Starting Download...
Unfortunately all 4 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
Errors: 0
Item no.: 1 --> Item name = water melon Evaluating... Starting Download...
Unfortunately all 4 could not be downloaded because some images were not downloadable. 0 is all we got for this search filter!
Errors: 0
这实际上创建了一个文件夹,但里面没有图像。
我认为 Google 正在改变 DOM。元素 class="rg_meta notranslate" 已不存在。改为class="rg_i ..."
def get_soup(url,header):
return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)),'html.parser')
def main(args):
query = "typical face"
query = query.split()
query = '+'.join(query)
url = "https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
soup = get_soup(url, headers)
for a in soup.find_all("img", {"class": "rg_i"}):
wget.download(a.attrs["data-iurl"], a.attrs["data-iid"])
if __name__ == '__main__':
from sys import argv
try:
main(argv)
except KeyboardInterrupt:
pass
sys.exit()
这个问题确实是前不久出现的,已经有一堆类似的Github问题:
- https://github.com/hardikvasa/google-images-download/pull/298
- https://github.com/hardikvasa/google-images-download/issues/301
- https://github.com/hardikvasa/google-images-download/issues/302
不幸的是,没有正式的解决方案,目前,您可以使用讨论中提供的临时解决方案。
这不起作用的原因是 google 改变了他们做所有事情的方式,因此您现在需要在搜索字符串中包含 api_key。因此,即使您使用 2.8.0 版本,google-images-download 等软件包也不再有效,因为它们没有占位符来插入您必须使用 [= 注册的 api_key 字符串28=] 以获得每天 2500 次免费下载。
如果您愿意每月支付 50 美元或更多费用来访问 serpapi.com 的服务,一种方法是使用 pip 包 google-search-results
并提供您的 api_key 作为查询参数的一部分。
params = {
"engine" : "google",
...
"api_key" : "secret_api_key"
}
您自己提供 API 密钥,然后调用:
client = GoogleSearchResults(params)
results = client.get_dict()
这个 returns 一个带有 link 的 JSON 字符串到所有图像 url,然后你直接下载它们。
google_images_download 项目似乎不再与 Google API 兼容。
作为替代方案,您可以尝试 simple_image_download。