如何在 python 中使用 Selenium 提取 <figure> 图像?
How to extract <figure> images using Selenium in python?
我正在尝试从应用商店的 xpath 上方提取图像:https://apps.apple.com/us/app/mercer-marketplace-benefits/id1041417557
我使用 xpath 尝试了以下代码:
driver.get('https://apps.apple.com/us/app/mercer-marketplace-benefits/id1041417557')
rating_distr = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.XPATH, """(//*[@id="ember290"]/div/div[2])""")))
print(rating_distr.get_attribute('innerHTML'))
但输出的不是图像:
<figure class="we-star-bar-graph">
<div class="we-star-bar-graph__row">
<span class="we-star-bar-graph__stars we-star-bar-graph__stars--5"></span>
<div class="we-star-bar-graph__bar">
<div class="we-star-bar-graph__bar__foreground-bar" style="width: 76%;"></div>
</div>
</div>
<div class="we-star-bar-graph__row">
<span class="we-star-bar-graph__stars we-star-bar-graph__stars--4"></span>
<div class="we-star-bar-graph__bar">
<div class="we-star-bar-graph__bar__foreground-bar" style="width: 12%;"></div>
有没有办法将输出提取为图像?感谢您的帮助!
打开网页,滚动到id所在的元素,我查了id,网页中你想要的部分是"ember290"
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import pyscreenshot as ImageGrab
browser = webdriver.Chrome() # we are using chrome as our webbrowser
browser.get('https://apps.apple.com/us/app/mercer-marketplace-benefits/id1041417557')
#rating_distr = WebDriverWait(browser,30).until(EC.presence_of_element_located((By.XPATH, """(//*[@id="ember290"]/div/div[2])""")))
ActionChains(browser).move_to_element(browser.find_element_by_id('ember290')).perform()
im=ImageGrab.grab()
im.show()
im=ImageGrab.grab(bbox=(162,650,500,500))
im.show()
ImageGrab.grab_to_file('im.png')
滚动完成后截图。
正如我在评论中所建议的,我认为 better/faster 方法是只获取值而不是截取屏幕截图。如果您截取屏幕截图,则必须有人手动打开它,然后以其他格式记录屏幕截图中的值,这将是一个漫长而乏味的过程。相反,只需从页面中抓取数据并将其转储为最终所需的格式。
例如,如果您只查看 HTML 的 5 星评级栏
<div class="we-star-bar-graph__row">
<span class="we-star-bar-graph__stars we-star-bar-graph__stars--5"></span>
<div class="we-star-bar-graph__bar">
<div class="we-star-bar-graph__bar__foreground-bar" style="width: 76%;"></div>
</div>
</div>
您可以看到应用了 class,we-star-bar-graph__stars--5
,表明它是什么星级。您还可以看到条形的宽度设置为 style="width: 76%;"
,这样可以告诉您 5 星评级的百分比。有了这些信息,我们就可以抓取每颗星的评分。
ratings = driver.find_elements_by_css_selector("figure.we-star-bar-graph div.we-star-bar-graph__bar__foreground-bar")
# get the width of the entire bar
width = float(driver.find_elements_by_css_selector(".we-star-bar-graph__bar").value_of_css_property("width"))[:-2])
for i in range(len(ratings), 0, -1) :
# get the width of the rating
rating = float(ratings[len(ratings) - i].value_of_css_property("width")[:-2])
print(str(i) + "-star rating: " + str(rating / width * 100) + "%")
这应该转储像
这样的值
5-star rating: 76%
4-star rating: 12%
3-star rating: 4%
2-star rating: 1%
1-star rating: 6%
这可能不是您最终想要的格式,但它应该能让您指明正确的方向。
我正在尝试从应用商店的 xpath 上方提取图像:https://apps.apple.com/us/app/mercer-marketplace-benefits/id1041417557
我使用 xpath 尝试了以下代码:
driver.get('https://apps.apple.com/us/app/mercer-marketplace-benefits/id1041417557')
rating_distr = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.XPATH, """(//*[@id="ember290"]/div/div[2])""")))
print(rating_distr.get_attribute('innerHTML'))
但输出的不是图像:
<figure class="we-star-bar-graph">
<div class="we-star-bar-graph__row">
<span class="we-star-bar-graph__stars we-star-bar-graph__stars--5"></span>
<div class="we-star-bar-graph__bar">
<div class="we-star-bar-graph__bar__foreground-bar" style="width: 76%;"></div>
</div>
</div>
<div class="we-star-bar-graph__row">
<span class="we-star-bar-graph__stars we-star-bar-graph__stars--4"></span>
<div class="we-star-bar-graph__bar">
<div class="we-star-bar-graph__bar__foreground-bar" style="width: 12%;"></div>
有没有办法将输出提取为图像?感谢您的帮助!
打开网页,滚动到id所在的元素,我查了id,网页中你想要的部分是"ember290"
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import pyscreenshot as ImageGrab
browser = webdriver.Chrome() # we are using chrome as our webbrowser
browser.get('https://apps.apple.com/us/app/mercer-marketplace-benefits/id1041417557')
#rating_distr = WebDriverWait(browser,30).until(EC.presence_of_element_located((By.XPATH, """(//*[@id="ember290"]/div/div[2])""")))
ActionChains(browser).move_to_element(browser.find_element_by_id('ember290')).perform()
im=ImageGrab.grab()
im.show()
im=ImageGrab.grab(bbox=(162,650,500,500))
im.show()
ImageGrab.grab_to_file('im.png')
滚动完成后截图。
正如我在评论中所建议的,我认为 better/faster 方法是只获取值而不是截取屏幕截图。如果您截取屏幕截图,则必须有人手动打开它,然后以其他格式记录屏幕截图中的值,这将是一个漫长而乏味的过程。相反,只需从页面中抓取数据并将其转储为最终所需的格式。
例如,如果您只查看 HTML 的 5 星评级栏
<div class="we-star-bar-graph__row">
<span class="we-star-bar-graph__stars we-star-bar-graph__stars--5"></span>
<div class="we-star-bar-graph__bar">
<div class="we-star-bar-graph__bar__foreground-bar" style="width: 76%;"></div>
</div>
</div>
您可以看到应用了 class,we-star-bar-graph__stars--5
,表明它是什么星级。您还可以看到条形的宽度设置为 style="width: 76%;"
,这样可以告诉您 5 星评级的百分比。有了这些信息,我们就可以抓取每颗星的评分。
ratings = driver.find_elements_by_css_selector("figure.we-star-bar-graph div.we-star-bar-graph__bar__foreground-bar")
# get the width of the entire bar
width = float(driver.find_elements_by_css_selector(".we-star-bar-graph__bar").value_of_css_property("width"))[:-2])
for i in range(len(ratings), 0, -1) :
# get the width of the rating
rating = float(ratings[len(ratings) - i].value_of_css_property("width")[:-2])
print(str(i) + "-star rating: " + str(rating / width * 100) + "%")
这应该转储像
这样的值5-star rating: 76%
4-star rating: 12%
3-star rating: 4%
2-star rating: 1%
1-star rating: 6%
这可能不是您最终想要的格式,但它应该能让您指明正确的方向。