Python Selenium 网络抓取 returns 使用 XPATH 没有数据

Python Selenium webscraping returns no data using XPATH

试图从网页中抓取数据。登录站点后,在开发人员工具中可以搜索 xpath 并找到匹配项。但是,paython 代码没有返回数据。

from datetime import datetime
from sqlite3 import Time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import requests
from bs4 import BeautifulSoup
import zerodhacred

browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe")
browser.get(loginURL)

nifty_bank_values_xpath = browser.find_elements(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")
    print("Len of nifty_bank_values_xpath: ",len(nifty_bank_values_xpath))

输出为

d:\My Personal22\Suresh\Learning\Python\zerodha.py:27: DeprecationWarning: executable_path 已被弃用,请通过 在服务对象中 浏览器 = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe")

DevTools 监听 ws://127.0.0.1:57153/devtools/browser/74a90941-a12f-4be4-b12a-01b256292a5f [15120:6400:0309/123030.129:ERROR:device_event_log_impl.cc(214)] [12:30:30.129] USB: usb_device_handle_win.cc:1049 无法从节点连接读取描述符:连接到系统的设备不工作。 (0x1F) [15120:6400:0309/123030.137:ERROR:device_event_log_impl.cc(214)] [12:30:30.136] USB: usb_device_handle_win.cc:1049 无法从节点连接读取描述符:连接到系统的设备不工作。 (0x1F) nifty_bank_values_xpathlen:0

类似地,当尝试 find_element

nifty_bank_values_xpath = browser.find_element(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")

出现以下错误:

提高exception_class(消息、屏幕、堆栈跟踪) selenium.common.exceptions.NoSuchElementException:消息:没有这样的元素:无法定位元素:{"method":"xpath","selector":"//span[contains(@class, 'pane-legend-item-value__main') ]"} (会话信息:chrome=99.0.4844.51) 堆栈跟踪: 回溯:

能够在 Dev Tools->Elements 中找到返回 6 个匹配项的数据。

图片

html 从开发控制台捕获

<body class="app-wrapper">
   <noscript><strong>We're sorry but kite doesn't work properly without JavaScript enabled. Please enable it to continue.</strong></noscript>
   <div id="app" class="app mobile page-tvchart">
      <div class="header">
         <div class="wrapper">
            <!----> 
            <div class="header-right">
               <!----> <!----> 
               <div class="app-nav mobile"><a href="/marketwatch" class=""><span class="icon icon-bookmark"></span></a> <a href="/dashboard" class=""><span class="icon icon-compass"></span></a> <a href="/orders" class=""><span class="icon icon-book"></span></a> <a href="/holdings" class=""><span class="icon icon-briefcase"></span></a> <a href="/positions" class=""><span class="icon icon-file-text"></span></a> <a href="/funds" class="margins"><span class="icon icon-credit-card"></span></a></div>
               <div class="right-nav">
                  <div class="user-nav perspective">
                     <a href="" class="dropdown-label">
                        <div id="avatar-43">
                           <div class="avatar" style="width: 25px; height: 25px; border-radius: 50%; text-align: center; vertical-align: middle; background-color: rgba(156, 39, 176, 0.1); font-size: 9px; font-weight: 300; color: rgb(156, 39, 176); line-height: 26px;"><span>SS</span></div>
                           <!---->
                        </div>
                        <span class="user-id">ZX8487</span>
                     </a>
                     <!---->
                  </div>
               </div>
            </div>
         </div>
      </div>
      <div class="container wrapper">
         <!----> 
         <div class="container-right">
            <!----> <!----> 
            <div class="page-content tvchart">
               <!----> 
               <div>
                  <div class="chart-frame">
                     <div id="tv_chart_container" class="tv-chart-container" style="height: 547px;"><iframe id="tradingview_e1a6c" name="tradingview_e1a6c" src="/static/tv-chart/static/en-tv-chart.aaac22e21df68f2f7bad.html#symbol=NIFTY%20BANK%3AINDICES%3A260105&amp;interval=1D&amp;widgetbar=%7B%22details%22%3Afalse%2C%22watchlist%22%3Afalse%2C%22watchlist_settings%22%3A%7B%22default_symbols%22%3A%5B%5D%7D%7D&amp;timeFrames=%5B%7B%22text%22%3A%225y%22%2C%22resolution%22%3A%22W%22%7D%2C%7B%22text%22%3A%221y%22%2C%22resolution%22%3A%22W%22%7D%2C%7B%22text%22%3A%226m%22%2C%22resolution%22%3A%22120%22%7D%2C%7B%22text%22%3A%223m%22%2C%22resolution%22%3A%2260%22%7D%2C%7B%22text%22%3A%221m%22%2C%22resolution%22%3A%2230%22%7D%2C%7B%22text%22%3A%225d%22%2C%22resolution%22%3A%225%22%7D%2C%7B%22text%22%3A%221d%22%2C%22resolution%22%3A%221%22%7D%5D&amp;locale=en&amp;uid=tradingview_e1a6c&amp;clientId=tradingview.com&amp;userId=ZX8487&amp;chartsStorageUrl=%2Fapi%2Fchart%2Fpreferences&amp;chartsStorageVer=1.1&amp;customCSS=%2Fstatic%2Ftv-chart%2Fstatic%2Fcustom_style.css&amp;debug=false&amp;timezone=Asia%2FKolkata&amp;theme=Light" frameborder="0" allowtransparency="true" scrolling="no" allowfullscreen="" style="display: block; width: 100%; height: 100%;"></iframe></div>
                  </div>
                  <div class="instrument-market-data">
                     <div class="row">
                        <div class="three columns">
                           <div class="label">Open</div>
                           <div class="value">33278.9</div>
                        </div>
                        <div class="three columns">
                           <div class="label">High</div>
                           <div class="value">33890.9</div>
                        </div>
                        <div class="three columns">
                           <div class="label">Low</div>
                           <div class="value">32948.9</div>
                        </div>
                        <div class="three columns">
                           <div class="label">Close</div>
                           <div class="value">33158.1</div>
                        </div>
                     </div>
                     <div class="row">
                        <div class="three columns">
                           <div class="label">Volume</div>
                           <div class="value">—</div>
                        </div>
                        <div class="three columns">
                           <div class="label">Avg. trade price</div>
                           <div class="value">—</div>
                        </div>
                        <div class="three columns">
                           <div class="label">Total buy quantity</div>
                           <div class="value">—</div>
                        </div>
                        <div class="three columns">
                           <div class="label">Total sell quantity</div>
                           <div class="value">—</div>
                        </div>
                     </div>
                  </div>
                  <!---->
               </div>
            </div>
         </div>
      </div>
      <!----> <!----> 
      <div class="baskets">
         <!----> <!----> <!----> <!----> <!----> <!---->
      </div>
      <!----> 
      <div>
         <!----> <!---->
      </div>
      <!----> <!----> <!----> <!----> 
      <div class="orders-basket">
         <!---->
      </div>
      <!----> <!---->
   </div>
   <script async="">try {
      var theme = JSON.parse(localStorage.__storejs_kite_theme);
      if (theme) {
        document.documentElement.setAttribute("data-theme", theme);
      }
      } catch (_) {
      }
   </script><script type="module" src="/static/js/chunk-vendors.ea6114a1.js"></script><script type="module" src="/static/js/app.ae4bb317.js"></script><script>!function(){var e=document,t=e.createElement("script");if(!("noModule"in t)&&"onbeforeload"in t){var n=!1;e.addEventListener("beforeload",function(e){if(e.target===t)n=!0;else if(!e.target.hasAttribute("nomodule")||!n)return;e.preventDefault()},!0),t.type="module",t.src=".",e.head.appendChild(t),t.remove()}}();</script><script src="/static/js/chunk-vendors-legacy.ea6114a1.js" nomodule=""></script><script src="/static/js/app-legacy.cffeb71c.js" nomodule=""></script>
   <div class="su-toast-groups">
      <div class="su-toast-group su-toast-top-left">
         <div></div>
      </div>
      <div class="su-toast-group su-toast-top-center">
         <div></div>
      </div>
      <div class="su-toast-group su-toast-top-right">
         <div></div>
      </div>
      <div class="su-toast-group su-toast-bottom-left">
         <div></div>
      </div>
      <div class="su-toast-group su-toast-bottom-center">
         <div></div>
      </div>
      <div class="su-toast-group su-toast-bottom-right">
         <div></div>
      </div>
   </div>
   <!---->
</body>

您在这里缺少 wait
在使用 find_elements 方法访问它们之前,您应该等待元素完全加载。
这里最好的方法是使用 Expected Conditions 显式等待,如下所示:

from datetime import datetime
from sqlite3 import Time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import requests
from bs4 import BeautifulSoup
import zerodhacred

browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe")
wait = WebDriverWait(browser, 20)
browser.get(loginURL)
wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")))
time.sleep(0.3) #short pause added to make sure that all the relevant elements are loaded, not only the first one
nifty_bank_values_xpath = browser.find_elements(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")
print("Len of nifty_bank_values_xpath: ",len(nifty_bank_values_xpath))

UPD
由于该元素位于 iframe 中,因此您必须先切换到该 iframe,然后才能访问其中的元素,如下所示:

from datetime import datetime
from sqlite3 import Time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import requests
from bs4 import BeautifulSoup
import zerodhacred

browser = webdriver.Chrome("C:/Users/SPILLAIP/Downloads/chromedriver_win32/chromedriver.exe")
wait = WebDriverWait(browser, 20)
browser.get(loginURL)
wait).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='tradingview']")))

wait.until(EC.visibility_of_element_located((By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")))
time.sleep(0.3) #short pause added to make sure that all the relevant elements are loaded, not only the first one
nifty_bank_values_xpath = browser.find_elements(By.XPATH, "//span[contains(@class, 'pane-legend-item-value__main')]")
print("Len of nifty_bank_values_xpath: ",len(nifty_bank_values_xpath))

在 iframe 中处理完元素后,要切换到默认内容,您需要执行

driver.switch_to.default_content()

感谢@Prophet 带我探索iframe.

browser.get(niftybankchartURL)
time.sleep(10)

# jump into iframe
browser.switch_to.frame(browser.find_element_by_tag_name("iframe"))

一旦 switch_to 进入框架,XPATH 就可以正常工作了。