使用 selenium python 从网页中抓取图像?

scraping images from a web page using selenium python?

另一个平台上的某个人要求某人从网站上抓取 图片。这个想法是图像加载在同一页面中。除了使用 selenium 加载页面中的所有图像然后提取每个图像 url 然后在新选项卡中打开每个图像并下载它之外,我找不到其他方法;但这非常耗费资源,在某些情况下图像会超过 200003 我是新手,我的网页设计背景还不错; 抓取 图像有更好的技术吗? 注意:我不是为了钱而做的;只是在练习新的技巧

https://generated.photos/faces/natural/front-facing/young-adult/white-race/brown-hair/short/joy/female/brown-eyes

Whosebug 没有回答写作 website.But 获取图像很简单。 1.Import 模块 [requests,BeautifulSoup]

2.Get网页来源。

3.Find 保存图像的 div 标签[可选步骤]

4.get 来自上面 div 标签的 img 标签 5. 从 img 标签中获取 src 属性。

import requests
from bs4 import BeautifulSoup
r=requests.get('https://generated.photos/faces/natural/front-facing/young-adult/white-race/brown-hair/short/joy/female/brown-eyes')
soup=BeautifulSoup(r.content)
di=soup.find('div',attrs={'class':'grid-photos'})
im=di.find_all('img')
links=[i['src'] for i in im]
links

AOA Muhammad 这是代码你可以跟着代码提取所有图片。

#import modules
import requests
import json
from bs4 import BeautifulSoup

#define headers
headers = {
    'authority': 'api.generated.photos',
    'sec-ch-ua': '^\^Google',
    'accept': 'application/json, text/plain, */*',
    'authorization': 'API-Key Cph30qkLrdJDkjW-THCeyA',
    'sec-ch-ua-mobile': '?0',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36',
    'origin': 'https://generated.photos',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://generated.photos/',
    'accept-language': 'en-PK,en-US;q=0.9,en;q=0.8',
    'cookie': 'gp_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFVG86HVJhY2s6OlNlc3Npb246OlNlc3Npb25JZAY6D0BwdWJsaWNfaWRJIkViMzUzYjQ3MTYyOTNjMzdkOTE2OTU4MzZkNzAxNjUyODY1MjU3NTExOTNlNzhmYjY2NDMyOTY1MDEyNjkxMDZiBjsARkkiDGNhcnRfaWQGOwBGSSIdNjA3YTg0YTdjN2VjMzEwMDBjZDY3ZGU3BjsAVA^%^3D^%^3D--038eee55b343dcdd77021c6b3494a8111809032d; _ga=GA1.2.1963701744.1618642096; _gid=GA1.2.180857723.1618642096; _gat=1',
}

#define the filters
filters = {
    'order_by': 'latest',
    'page': '1',
    'per_page': '30',
    'face': 'natural',
    'head_pose': 'front-facing',
    'age': 'young-adult',
    'ethnicity': 'white',
    'hair_color': 'brown',
    'hair_length': 'short',
    'emotion':'joy',
    'gender':'female',
    'eye_color': 'brown',
}

#Now requests to website

image_url = []
#start loop for pagination
for i in range(1,687):       
    api = f"https://api.generated.photos/api/frontend/v1/images?order_by=latest&page={i}&per_page=30&face=natural&head_pose=front-facing&age=young-adult&ethnicity=white&hair_color=brown&hair_length=short&emotion=joy&gender=female&eye_color=brown"
    response = requests.get(api, headers=headers)
    #loads the response to json
    json_res = json.loads(response.content)        
    image = json_res['images']
    for url in image:
        image_url.append(url['thumb_url'])


#Download the image
for url in image_url:      
    img_content = requests.get(url).content
    with open('Image.jpg','wb') as fh:
        fh.write(img_content)

P:S 请记住,这将花费很多时间,因此,如果您只是为了练习,可以更改 (1,4) 之类的范围。