在 Python 中使用 Selenium 提取具有特定 class 的链接
extracting links with a specific class with Selenium in Python
我正在尝试从无限滚动中提取 links website
这是我向下滚动页面的代码
driver = webdriver.Chrome('C:\Program Files (x86)\Google\Chrome\chromedriver.exe')
driver.get('http://seekingalpha.com/market-news/top-news')
for i in range(0,2):
driver.implicitly_wait(15)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(20)
我的目标是从此页面中提取特定的 link。使用 class = "market_current_title" 和 HTML 如下所示:
<a class="market_current_title" href="/news/3223955-dow-wraps-best-week-since-2011-s-and-p-strongest-week-since-2014" sasource="titles_mc_top_news" target="_self">Dow wraps up best week since 2011; S&P in strongest week since 2014</a>
当我使用
URL = driver.find_elements_by_class_name('market_current_title')
我最终遇到了 "stale element reference: element is not attached to the page document" 的错误。然后我尝试了
URL = driver.find_elements_by_xpath("//div[@id='a']//a[@class='market_current_title']")
但是它说没有这样的link!!!
你有解决这个问题的想法吗?
您可能正在尝试与已更改的元素进行交互(可能是滚动上方和屏幕外的元素)。尝试 this answer 寻找一些关于如何克服这个问题的好选择。
这是一个片段:
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
import selenium.webdriver.support.expected_conditions as EC
import selenium.webdriver.support.ui as ui
# return True if element is visible within 2 seconds, otherwise False
def is_visible(self, locator, timeout=2):
try:
ui.WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.CSS_SELECTOR, locator)))
return True
except TimeoutException:
return False
我正在尝试从无限滚动中提取 links website
这是我向下滚动页面的代码
driver = webdriver.Chrome('C:\Program Files (x86)\Google\Chrome\chromedriver.exe')
driver.get('http://seekingalpha.com/market-news/top-news')
for i in range(0,2):
driver.implicitly_wait(15)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(20)
我的目标是从此页面中提取特定的 link。使用 class = "market_current_title" 和 HTML 如下所示:
<a class="market_current_title" href="/news/3223955-dow-wraps-best-week-since-2011-s-and-p-strongest-week-since-2014" sasource="titles_mc_top_news" target="_self">Dow wraps up best week since 2011; S&P in strongest week since 2014</a>
当我使用
URL = driver.find_elements_by_class_name('market_current_title')
我最终遇到了 "stale element reference: element is not attached to the page document" 的错误。然后我尝试了
URL = driver.find_elements_by_xpath("//div[@id='a']//a[@class='market_current_title']")
但是它说没有这样的link!!! 你有解决这个问题的想法吗?
您可能正在尝试与已更改的元素进行交互(可能是滚动上方和屏幕外的元素)。尝试 this answer 寻找一些关于如何克服这个问题的好选择。
这是一个片段:
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
import selenium.webdriver.support.expected_conditions as EC
import selenium.webdriver.support.ui as ui
# return True if element is visible within 2 seconds, otherwise False
def is_visible(self, locator, timeout=2):
try:
ui.WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.CSS_SELECTOR, locator)))
return True
except TimeoutException:
return False