尝试使用 Selenium 和 Python 从悬停弹出窗口中抓取数据

Question

我似乎无法 select 触发 Selenium 悬停操作的正确元素。我可以使用 cookie 登录，可以看到页面加载，然后向下滚动，但我必须 selecting 不正确的 xpath 元素，并且无法完成悬停以从所需的 [=35] 获取数据=]. W/o 登录，您将看不到我看到的 table，所以我将在下面放置一些 HTML，以及我收到的错误消息，以及截图：

from selenium import webdriver
from selenium.webdriver.firefox import firefox_profile
from selenium.webdriver.common.action_chains import ActionChains

from selenium.webdriver.common.keys import Keys
import numpy as np
import pandas as pd
import re

url = 'https://www.oddsportal.com/tennis/czech-republic/wta-ostrava/dodin-oceane-linette-magda-h4MPchI4/'

fp = webdriver.FirefoxProfile('/Users/Frontwing/Library/Application Support/Firefox/Profiles/zku6ulmv.ProxyTest')
driver = webdriver.Firefox(executable_path="config/geckodriver",firefox_profile=fp)
driver.get(url)

items = driver.find_element_by_xpath("/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/div[7]/div[3]") \
.get_attribute("innerHTML") 



def get_cash_data():
    # I. Get the raw data by hovering and collecting
    xpath = "/html/body/div[1]/div/div[2]/div[6]/div[1]/div/div[1]/div[2]/div[1]/div[7]/div[3]"
    
    data = driver.find_element_by_xpath(xpath)
    driver.execute_script("arguments[0].scrollIntoView();", data)
    hov = ActionChains(driver).move_to_element(data)
    

    hov.perform()
    data_in_the_bubble = driver.find_element_by_xpath(xpath)
    hover_data = data_in_the_bubble.get_attribute("innerHTML")

    # II. Extract opening odds
    b = re.split('<br>', hover_data)
    #c = re.split('\(([^()]\d+)\)', b)
    #opening_odd = c

    #print(opening_odd)
    print(hover_data)
    return(b)

#print([x.text for x in items])
get_cash_data()

这是我要悬停的目标的 HTML 片段：

<td class="right odds up"><div onmouseout="delayHideTip()" onmouseover="page.hist(this,'E-0.00-0-0','55fmkx2s5a4x0xdlo2u',44,event,0,1)" style="display: block;">2.42<br>(42)</div></td>

我认为 xpath 可能是一种更容易查明元素的方法，但也许不是？这是错误消息：

文件“opstest.py”，第 44 行，位于 get_cash_data() selenium.common.exceptions.NoSuchElementException：消息：无法定位元素：.right odds

特别是，我怀疑问题与 xpath 有关，也可能与 hov 工作流程有关。我一定是做错了，或者顺序不对..

Answer 1

我没有登录here，但我可以看到所有的 tr 和 td，它们可能不同，但抓取工具提示逻辑将保持不变。

以全屏模式启动 browser。
使用显式等待以获得更高的稳定性。
比 xpath 更喜欢 css。
始终使用相对 xpath 而不是 绝对 xpath.

示例代码：

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(30)
wait = WebDriverWait(driver, 50)
action = ActionChains(driver)
driver.get("https://www.oddsportal.com/tennis/czech-republic/wta-ostrava/dodin-oceane-linette-magda-h4MPchI4/")
first_td = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='lo odd']/td[2]")))
ActionChains(driver).move_to_element(first_td).perform()
tool_tip_text_container = driver.find_element(By.CSS_SELECTOR, "span#tooltiptext").get_attribute('innerText')
print(tool_tip_text_container)

进口：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

输出：

20 Sep, 17:42 2.30 +0.03
20 Sep, 17:29 2.27 +0.02
20 Sep, 17:25 2.25 -0.20
20 Sep, 17:24 2.45 +0.20
20 Sep, 17:20 2.25 -0.20
20 Sep, 17:19 2.45 +0.20
20 Sep, 16:58 2.25 -0.20
20 Sep, 16:56 2.45 +0.20
20 Sep, 16:30 2.25 -0.20
20 Sep, 16:29 2.45 +0.18
20 Sep, 16:23 2.27 -0.18
20 Sep, 16:21 2.45 +0.20
20 Sep, 15:52 2.25 -0.20
20 Sep, 15:51 2.45 +0.20
20 Sep, 15:45 2.25 -0.20
20 Sep, 15:42 2.45 +0.20
20 Sep, 15:41 2.25 -0.20
20 Sep, 15:36 2.45 +0.12
20 Sep, 15:16 2.33 -0.12
20 Sep, 15:14 2.45 +0.12

Opening odds:
19 Sep, 19:00 2.33

尝试使用 Selenium 和 Python 从悬停弹出窗口中抓取数据

Trying to scrape data from a hover popup with Selenium and Python

python

selenium

webdriver

geckodriver