Selenium 搜索 IS 并不总是有效?
Selenium searching for ID not allways working?
过去我经常 运行 在网站“延迟加载”时遇到问题 -
当我使用这样的 id 搜索时它很有帮助
element = driver.find_element_by_id ("analyst-estimate")
driver.execute_script ("arguments[0].scrollIntoView();", element)
现在我发现这并不适用于每个站点
在以下站点上一切正常:
link = "https://www.gurufocus.com/stock/AAPL/summary"
options = Options ()
options.add_argument ('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
path = os.path.abspath (os.path.dirname (sys.argv[0]))
if platform == "win32": cd = '/chromedriver.exe'
elif platform == "linux": cd = '/chromedriver_linux'
elif platform == "darwin": cd = '/chromedriver'
driver = webdriver.Chrome (path + cd, options=options)
driver.get (link) # Read link
time.sleep (2) # Wait till the full site is loaded
element = driver.find_element_by_id ("analyst-estimate")
driver.execute_script ("arguments[0].scrollIntoView();", element)
time.sleep (1)
但在另一个网站上(也有一个 id - 它根本不起作用)
link = "https://finance.yahoo.com/quote/MSFT/analysis?p=MSFT"
options = Options ()
options.add_argument ('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
path = os.path.abspath (os.path.dirname (sys.argv[0]))
if platform == "win32": cd = '/chromedriver.exe'
elif platform == "linux": cd = '/chromedriver_linux'
elif platform == "darwin": cd = '/chromedriver'
driver = webdriver.Chrome (path + cd, options=options)
driver.get (link) # Read link
time.sleep (2) # Wait till the full site is loaded
element = driver.find_element_by_id ("YDC-Col1")
# element = driver.find_element_by_id ("Col2-4-QuoteModule-Proxy")
# element = driver.find_element_by_id ("app")
driver.execute_script ("arguments[0].scrollIntoView();", element)
time.sleep (1)
为什么这不适用于第二个网站?
完全相同的代码 - 为什么他找不到 ID - 它存在于网页上?
Time.sleep() 在等待页面加载时不是很稳定。切换到 webdriver 等待。而且加载似乎不需要2秒。
wait = WebDriverWait(driver, 5)
wait.until(EC.presence_of_element_located((By.ID, "YDC-Col1")))
另一个问题可能是使用 headless 而不是设置 window 大小。
options.add_argument('--headless')
options.add_argument("--window-size=1920,1080")
导入
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
在加载页面之前会弹出一个接受 cookie 的窗口,您必须先单击它:
WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.NAME, "agree"))).click()
WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.ID, "YDC-Col1")))
在无头模式下测试某些东西之前,先在非无头模式下检查以查看实际行为,如果它仅在无头模式下失败,请截取屏幕截图以了解失败期间网站的状态。
您可以截屏为:
try:
link = "https://finance.yahoo.com/quote/MSFT/analysis?p=MSFT"
options = ChromeOptions()
options.add_argument('--headless')
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
driver.get(link) # Read link
time.sleep(2) # Wait till the full site is loaded
element = driver.find_element_by_id("YDC-Col1")
# element = driver.find_element_by_id ("Col2-4-QuoteModule-Proxy")
# element = driver.find_element_by_id ("app")
driver.execute_script("arguments[0].scrollIntoView();", element)
time.sleep(1)
except:
driver.get_screenshot_as_file("a.jpeg")
过去我经常 运行 在网站“延迟加载”时遇到问题 -
当我使用这样的 id 搜索时它很有帮助
element = driver.find_element_by_id ("analyst-estimate")
driver.execute_script ("arguments[0].scrollIntoView();", element)
现在我发现这并不适用于每个站点
在以下站点上一切正常:
link = "https://www.gurufocus.com/stock/AAPL/summary"
options = Options ()
options.add_argument ('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
path = os.path.abspath (os.path.dirname (sys.argv[0]))
if platform == "win32": cd = '/chromedriver.exe'
elif platform == "linux": cd = '/chromedriver_linux'
elif platform == "darwin": cd = '/chromedriver'
driver = webdriver.Chrome (path + cd, options=options)
driver.get (link) # Read link
time.sleep (2) # Wait till the full site is loaded
element = driver.find_element_by_id ("analyst-estimate")
driver.execute_script ("arguments[0].scrollIntoView();", element)
time.sleep (1)
但在另一个网站上(也有一个 id - 它根本不起作用)
link = "https://finance.yahoo.com/quote/MSFT/analysis?p=MSFT"
options = Options ()
options.add_argument ('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
path = os.path.abspath (os.path.dirname (sys.argv[0]))
if platform == "win32": cd = '/chromedriver.exe'
elif platform == "linux": cd = '/chromedriver_linux'
elif platform == "darwin": cd = '/chromedriver'
driver = webdriver.Chrome (path + cd, options=options)
driver.get (link) # Read link
time.sleep (2) # Wait till the full site is loaded
element = driver.find_element_by_id ("YDC-Col1")
# element = driver.find_element_by_id ("Col2-4-QuoteModule-Proxy")
# element = driver.find_element_by_id ("app")
driver.execute_script ("arguments[0].scrollIntoView();", element)
time.sleep (1)
为什么这不适用于第二个网站? 完全相同的代码 - 为什么他找不到 ID - 它存在于网页上?
Time.sleep() 在等待页面加载时不是很稳定。切换到 webdriver 等待。而且加载似乎不需要2秒。
wait = WebDriverWait(driver, 5)
wait.until(EC.presence_of_element_located((By.ID, "YDC-Col1")))
另一个问题可能是使用 headless 而不是设置 window 大小。
options.add_argument('--headless')
options.add_argument("--window-size=1920,1080")
导入
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
在加载页面之前会弹出一个接受 cookie 的窗口,您必须先单击它:
WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.NAME, "agree"))).click()
WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.ID, "YDC-Col1")))
在无头模式下测试某些东西之前,先在非无头模式下检查以查看实际行为,如果它仅在无头模式下失败,请截取屏幕截图以了解失败期间网站的状态。
您可以截屏为:
try:
link = "https://finance.yahoo.com/quote/MSFT/analysis?p=MSFT"
options = ChromeOptions()
options.add_argument('--headless')
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
driver.get(link) # Read link
time.sleep(2) # Wait till the full site is loaded
element = driver.find_element_by_id("YDC-Col1")
# element = driver.find_element_by_id ("Col2-4-QuoteModule-Proxy")
# element = driver.find_element_by_id ("app")
driver.execute_script("arguments[0].scrollIntoView();", element)
time.sleep(1)
except:
driver.get_screenshot_as_file("a.jpeg")