无法抓取图像的正确纵横比 - Python

Question

我在使用 python 从“漫画”网站提取图像时遇到问题。下面是网站上的元素示例：

img id="漫画" class="加载中" onerror="this.src='data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7'; this.removeAttribute('onerror'); this.className = 'loaderror';" src="https://example_on_the_image.jpg"> == $0"

我能够解析出“src”link & 如果使用普通浏览器查看图像纵横比应该如下：

渲染尺寸：920 × 1301 像素
渲染宽高比：920∶1301
内部尺寸：720 × 1018 像素
固有纵横比：360∶509
文件大小：101 kB
当前来源：（图像的url）

然而，我下载的图像变成了“160 x 160 像素”并且文件大小更小。我尝试使用 Beautifulsoup、Selenium 等，仍然得到相同的结果。

但是如果我使用：

浏览器并右键单击“将图像另存为”
检查 -> 在图像元素上 -> 右键单击 -> 捕获节点屏幕截图

我能够使用普通浏览器将“渲染大小”保存为上述 2 种方法。为什么使用 python 抓取，我无法得到正确的纵横比？？

希望有人能指导我这方面或我做错的地方，谢谢。

Answer 1

解决了这个问题，Selenium 无法以完整渲染大小“截屏”元素，但使用 Playwright 可以让我在加载浏览器后显示的正确宽高比上截屏。

Answer 2

''' 这是我的编剧代码：

from playwright.sync_api import sync_playwright

    manga_url = ("the url that u going to scrape")
    dwn_path = your_directory
    os.chdir(dwn_path) 

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False, slow_mo=500)
        page = browser.new_page()
        page.goto(manga_url)
        page.locator("#comic").screenshot(path="screenshot.png")
        print(page.title())
        browser.close()

无法抓取图像的正确纵横比 - Python

Cannot scrape the correct aspect ration of the image - Python

image

css-selectors

web-scraping

python-3.x

selenium-webdriver