BeautifulSoup,抓取,获取图像大小而不在标签中?

BeautifulSoup, Scraping, get image size without being in the tags?

目标:http://voorraadmodule.vwe-advertentiemanager.nl/s9376368b43e8fd6a8025bfa284d8e732/e7c2/stock/vehicles/100/ 我学习了 8 天 python,真的很喜欢它。目标页面是我的老雇主,为了测试我的技能,我想写一个 python 程序,每天检查他的库存并给我变化的结果(售出、降价等)

对我来说,可以获得我想要的所有参数,除了 marker/trigger 用于已售出但尚未从页面删除的汽车。

当您访问目标页面时,您会看到一些图像带有带有 "verkocht" 的色带。我搜索了所有 HTML 代码中没有触发它是否出售,CMS 只用丝带改变图片。我确实注意到发生这种情况时缩略图会改变大小,所以我希望这是我的触发器。

部分代码:

from bs4 import BeautifulSoup
import requests

url = "http://voorraadmodule.vwe-advertentiemanager.nl/s9376368b43e8fd6a8025bfa284d8e732/e7c2/stock/vehicles/100/"
img_pre_url = "http://voorraadmodule.vwe-advertentiemanager.nl/s4c74bf131813e9d7d3232b46224830a2"
getpage = requests.get(url)
soup = BeautifulSoup(getpage.text, "html.parser")

for listingparse in soup.find_all("div", class_="row clearfix "):

    ftch_id = listingparse.get("id")[8:]

    ftch_imgurl = listingparse.find("div", class_="columnPhoto").img["src"]

    print("List id: "+ftch_id + "Image url: "+img_pre_url+ftch_imgurl)

为了在原始版本中演示这一点,我得到了这部分内容,我将它连同更多参数一起写入 csv。

最终目标是获得变量 'sold_marker: V' 用于售出或 'sold_marker: X' 用于当前列表

作为菜鸟,我认为我有 2 个选择。 1.下载图片并用numpy测量尺寸 2. 如果色带存在,使用一些图像处理库并用难看的绿色进行测量。

你们会如何处理这件事?我希望不必每天下载图像来衡量这一点,但我想别无选择。

图像大小在其 url 中,因此您可以使用 "/" 拆分 url 并从列表中获取大小。

from bs4 import BeautifulSoup
import requests

url = "http://voorraadmodule.vwe-advertentiemanager.nl/s9376368b43e8fd6a8025bfa284d8e732/e7c2/stock/vehicles/100/"
img_pre_url = "http://voorraadmodule.vwe-advertentiemanager.nl/s4c74bf131813e9d7d3232b46224830a2"
getpage = requests.get(url)
soup = BeautifulSoup(getpage.text, "html.parser")

for listingparse in soup.find_all("div", class_="row clearfix "):

    ftch_id = listingparse.get("id")[8:]
    ftch_imgurl = listingparse.find("div", class_="columnPhoto").img["src"]

    url_parts = ftch_imgurl.split('/')

    if url_parts[-2] == "260x195":
        verkocht = "verkocht"
    else:
        verkocht = ""

    print("List id:", ftch_id)
    print("Image url:", img_pre_url+ftch_imgurl)
    print("image size:", url_parts[-2], verkocht)
    print('---')

结果:

List id: 15668794
Image url: http://voorraadmodule.vwe-advertentiemanager.nl/s4c74bf131813e9d7d3232b46224830a2/vehicle-images/15668794/1/1513180329/320x213/citroen-xsara-picasso-1-6i-attraction-zeer-ruime-gezinsauto
image size: 320x213 
---
List id: 15529833
Image url: http://voorraadmodule.vwe-advertentiemanager.nl/s4c74bf131813e9d7d3232b46224830a2/vehicle-images/15529833/1/1512131899/260x195/dacia-logan-mcv-1-6-laureate-zeer-ruime-buitenkans
image size: 260x195 verkocht
---
List id: 15427090
Image url: http://voorraadmodule.vwe-advertentiemanager.nl/s4c74bf131813e9d7d3232b46224830a2/vehicle-images/15427090/1/1510153600/320x213/fiat-punto-evo-1-3-m-jet-dynamic
image size: 320x213 
---
List id: 15287283
Image url: http://voorraadmodule.vwe-advertentiemanager.nl/s4c74bf131813e9d7d3232b46224830a2/vehicle-images/15287283/1/1508421733/320x213/hyundai-matrix-1-6i-active-ek-2008-automaat-parkeersensoor-achter
image size: 320x213 
---
List id: 15218532
Image url: http://voorraadmodule.vwe-advertentiemanager.nl/s4c74bf131813e9d7d3232b46224830a2/vehicle-images/15218532/1/1513263561/260x195/land-rover-range-rover-sport-3-6-tdv8-hse-vol-met-opties
image size: 260x195 verkocht
---
List id: 13888171
Image url: http://voorraadmodule.vwe-advertentiemanager.nl/s4c74bf131813e9d7d3232b46224830a2/vehicle-images/13888171/1/1491479399/320x213/maserati-quattroporte-4-7-s
image size: 320x213 
---