使用 python 和 selenium 使用图像的 "src" 属性下载图像

Question

我是 Python 和 Selenium 的新手。我的目标是从 Google 图片搜索结果页面下载图片并将其保存为本地目录中的文件，但我一开始无法下载图片。

我知道还有其他选项（使用请求通过 url 检索图像等），但我想知道是否可以使用图像的“src”属性来实现，例如，“数据：image/jpeg；base64，/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxM...”

我的代码如下（为简洁起见，我删除了所有导入等。）：

# This creates the folder to store the image in
if not os.path.exists(save_folder):
    os.mkdir(save_folder)

driver = webdriver.Chrome(PATH)

# Goes to the given web page
driver.get("https://www.google.com/imghp?hl=en&ogbl")

# "q" is the name of the google search field input
search_bar = driver.find_element_by_name("q")

# Input the search term(s)
search_bar.send_keys("Ben Folds Songs for Silverman Album Cover")

# Returns the results (basically clicks "search")
search_bar.send_keys(Keys.RETURN)

# Wait 10 seconds for the images to load on the page before moving on to the next part of the script
try:
    # Returns a list
    search_results = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "islrg"))
    )
    # print(search_results.text)

    # Gets all of the images on the page (it should be a list)
    images = search_results.find_elements_by_tag_name("img")

    # I just want the first result
    image = images[0].get_attribute('src')

    ### Need help here ###

except:
    print("Error")
    driver.quit()

# Closes the browser
driver.quit()

我试过：

urllib.request.urlretrieve(image, "00001.jpg")

和

urllib3.request.urlretrieve(image, f"{save_folder}/captcha.png")

但我总是使用这些方法来解决“例外”问题。看了一个有前途的post，我也试过：

bufferedImage = imageio.read(image)
outputFile = f"{save_folder}/image.png"
imageio.write(bufferedImage, "png", outputFile)

有类似的结果，虽然我相信后者 example 在 post 中使用了 Java 并且我可能在将它翻译成 Python 时犯了错误。

我确定这很明显，但我做错了什么？感谢您的帮助。

Answer 1

在这种情况下，您要处理的 URL 是一个 Data URL，它是用 base64 编码的图像本身的数据。

从 Python 3.4+ 开始，您可以使用 urllib.request.urlopen:

读取此数据并将其解码为字节

import urllib

data_url = "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxM..."

with urllib.request.urlopen(data_url) as response:
    data = response.read()
    with open("some_image.jpg", mode="wb") as f:
        f.write(data)

或者，您可以使用 base64:

自行解码数据的 base64 编码部分 url

import base64

data_url = "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxM..."
base64_image_data = data_url.split(",")[1]
data = base64.b64decode(base64_image_data)

with open("some_image.jpg", mode="wb") as f:
    f.write(data)

使用 python 和 selenium 使用图像的 "src" 属性下载图像

Using python and selenium to download an image using the image's "src" attribute

python

selenium

urllib

urllib3

python-imageio