使用 selenium 网络驱动程序提取 table 的特定元素
Extract a specific element of a table with selenium web driver
您好,我正在尝试从该网站提取一些元素:https://www.oddsportal.com/basketball/italy/lega-a-super-cup/sassari-brindisi-rTJFaIyk/
我想要主队和客队的最高赔率。这些数据位于 table 的末尾,分别是:1.31 和 4.57
这是我的脚本:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from concurrent.futures import ThreadPoolExecutor
options = Options()
options.headless = True
options.add_argument("window-size=1400,800")
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
options.add_argument("start-maximized")
options.add_argument("enable-automation")
options.add_argument("--disable-infobars")
options.add_argument("--disable-dev-shm-usage")
driver = webdriver.Chrome(options=options)
driver.get("https://www.oddsportal.com/basketball/italy/lega-a-super-cup/sassari-brindisi-rTJFaIyk/")
home_average_odds = [my_elem.text for my_elem in WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[@class="highest"]/td[contains(@class, "right")]')))]
for i in home_average_odds:
print(i)
driver.close()
driver.quit()
问题是我没有得到好的结果,这里是输出:
1.31
4.30
100.4%
什么是“好结果”?
你可以通过拉动 table 和 pandas 来获得平均值,然后拉动那一行:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
options = Options()
options.headless = True
options.add_argument("window-size=1400,800")
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
options.add_argument("start-maximized")
options.add_argument("enable-automation")
options.add_argument("--disable-infobars")
options.add_argument("--disable-dev-shm-usage")
driver = webdriver.Chrome(options=options)
driver.get("https://www.oddsportal.com/basketball/italy/lega-a-super-cup/sassari-brindisi-rTJFaIyk/")
html = driver.page_source
df = pd.read_html(html)[0]
avg = df[df['Bookmakers'] == 'Average']
print (avg)
输出:
print (avg)
Bookmakers 1 2 Payout Unnamed: 4
49 Average -408 +291 94.4% NaN
输出匹配 table
您好,我正在尝试从该网站提取一些元素:https://www.oddsportal.com/basketball/italy/lega-a-super-cup/sassari-brindisi-rTJFaIyk/
我想要主队和客队的最高赔率。这些数据位于 table 的末尾,分别是:1.31 和 4.57
这是我的脚本:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from concurrent.futures import ThreadPoolExecutor
options = Options()
options.headless = True
options.add_argument("window-size=1400,800")
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
options.add_argument("start-maximized")
options.add_argument("enable-automation")
options.add_argument("--disable-infobars")
options.add_argument("--disable-dev-shm-usage")
driver = webdriver.Chrome(options=options)
driver.get("https://www.oddsportal.com/basketball/italy/lega-a-super-cup/sassari-brindisi-rTJFaIyk/")
home_average_odds = [my_elem.text for my_elem in WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[@class="highest"]/td[contains(@class, "right")]')))]
for i in home_average_odds:
print(i)
driver.close()
driver.quit()
问题是我没有得到好的结果,这里是输出:
1.31
4.30
100.4%
什么是“好结果”?
你可以通过拉动 table 和 pandas 来获得平均值,然后拉动那一行:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from concurrent.futures import ThreadPoolExecutor
import pandas as pd
options = Options()
options.headless = True
options.add_argument("window-size=1400,800")
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
options.add_argument("start-maximized")
options.add_argument("enable-automation")
options.add_argument("--disable-infobars")
options.add_argument("--disable-dev-shm-usage")
driver = webdriver.Chrome(options=options)
driver.get("https://www.oddsportal.com/basketball/italy/lega-a-super-cup/sassari-brindisi-rTJFaIyk/")
html = driver.page_source
df = pd.read_html(html)[0]
avg = df[df['Bookmakers'] == 'Average']
print (avg)
输出:
print (avg)
Bookmakers 1 2 Payout Unnamed: 4
49 Average -408 +291 94.4% NaN
输出匹配 table