如何在Python中获取div标签中的样式值?
How to get the style value in a div tag in Python?
我想抓取单个网页中的图片,图片 URL 在 div 标签中,作为样式值进行验证,如下所示:
<div class="v-image__image v-image__image--cover" style="background-image: url("https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/f3ea4910e239eb704af755c65f548e35_car.png"); background-position: center center;"></div>
我想得到:https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/f3ea4910e239eb704af755c65f548e35_car.png
但是当我尝试 chrome 驱动程序查找元素或 soup.find 它们 return 空列表时,那是因为 div 标签之间的文本什么都没有。
我正在寻找一种方法让 进入 div 标签,而不是介于两者之间。
Selenium针对此问题的解决方案如下:
您可能应该等待元素可见性,然后才提取元素属性。
拆分整个样式属性以获得 url 值。
像这样:
wait = WebDriverWait(driver, 20)
element = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@style,'https://mashinbank.com/api/parse/files')]")))
style_content = element.get_attribute("style")
url = style_content.split(";")[1]
要获得您应该获得的所有 images
,use presence of all the elements.
一旦你在 Python
中有了 list,例如 all_images
(见下文),你可以 remove the ()
和 ""
如下所示。
示例代码:
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(50)
wait = WebDriverWait(driver, 20)
links = []
driver.get("https://mashinbank.com/ad/GkbI20tzp3/%D8%AE%D8%B1%DB%8C%D8%AF-%D9%BE%D8%B1%D8%A7%DB%8C%D8%AF-111-SE-1397")
all_images = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'v-image__image--cover')]")))
for image in all_images:
a = image.get_attribute('style')
b = a.split("(")[1].split(")")[0].replace('"', '')
links.append(b)
print(links)
进口:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
输出:
['https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/cabdf9f3f379e5b839300f89a90ab27e_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/e1c6c75dda980a6b4b4a83932ed49832_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/81ef7c57ca349485a9ba78bf0e42e13f_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/02bd13f2c5ce936ec3db10706c03854d_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/cabdf9f3f379e5b839300f89a90ab27e_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/e1c6c75dda980a6b4b4a83932ed49832_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/81ef7c57ca349485a9ba78bf0e42e13f_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/02bd13f2c5ce936ec3db10706c03854d_car.png']
我想抓取单个网页中的图片,图片 URL 在 div 标签中,作为样式值进行验证,如下所示:
<div class="v-image__image v-image__image--cover" style="background-image: url("https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/f3ea4910e239eb704af755c65f548e35_car.png"); background-position: center center;"></div>
我想得到:https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/f3ea4910e239eb704af755c65f548e35_car.png
但是当我尝试 chrome 驱动程序查找元素或 soup.find 它们 return 空列表时,那是因为 div 标签之间的文本什么都没有。
我正在寻找一种方法让 进入 div 标签,而不是介于两者之间。
Selenium针对此问题的解决方案如下:
您可能应该等待元素可见性,然后才提取元素属性。
拆分整个样式属性以获得 url 值。
像这样:
wait = WebDriverWait(driver, 20)
element = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@style,'https://mashinbank.com/api/parse/files')]")))
style_content = element.get_attribute("style")
url = style_content.split(";")[1]
要获得您应该获得的所有 images
,use presence of all the elements.
一旦你在 Python
中有了 list,例如 all_images
(见下文),你可以 remove the ()
和 ""
如下所示。
示例代码:
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(50)
wait = WebDriverWait(driver, 20)
links = []
driver.get("https://mashinbank.com/ad/GkbI20tzp3/%D8%AE%D8%B1%DB%8C%D8%AF-%D9%BE%D8%B1%D8%A7%DB%8C%D8%AF-111-SE-1397")
all_images = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'v-image__image--cover')]")))
for image in all_images:
a = image.get_attribute('style')
b = a.split("(")[1].split(")")[0].replace('"', '')
links.append(b)
print(links)
进口:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
输出:
['https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/cabdf9f3f379e5b839300f89a90ab27e_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/e1c6c75dda980a6b4b4a83932ed49832_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/81ef7c57ca349485a9ba78bf0e42e13f_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/02bd13f2c5ce936ec3db10706c03854d_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/cabdf9f3f379e5b839300f89a90ab27e_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/e1c6c75dda980a6b4b4a83932ed49832_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/81ef7c57ca349485a9ba78bf0e42e13f_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/02bd13f2c5ce936ec3db10706c03854d_car.png']