如何使网页停止加载并从中提取文本

How to make a webpage stop loading and extract text from it

我想使用以下代码从 url-shortner 中提取文本:


    import os
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    
    os.environ['PATH'] += 'C:/Selenium Drivers'
    driver = webdriver.Chrome()
    driver.implicitly_wait(10)
    driver.get('https://pastebin.com/vkuagfwV')
    strings = str(driver.find_element(By.CLASS_NAME, 'textarea').text)
    strings = strings.replace("\n", " ")
    driver.close()
    
    print(strings)

但是在我手动阻止网页停止加载之前,此代码无法正常工作。我也尝试使用 XPATH,但它没有用。

尝试在此处使用预期条件 visibility_of_element_located 方法,而不是 implicitly_wait
另外如评论中所述,您不需要在那里使用 str 强制转换。
请试试这个:

import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
    
os.environ['PATH'] += 'C:/Selenium Drivers'
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)

driver.get('https://pastebin.com/vkuagfwV')
strings = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "textarea"))).text

strings = strings.replace("\n", " ")
driver.close()
    
print(strings)

UPD
请添加eagerpageLoadStrategy配置

import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

    
os.environ['PATH'] += 'C:/Selenium Drivers'
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "eager"
driver = webdriver.Chrome(desired_capabilities=caps, executable_path=r'C:\path\to\chromedriver.exe')

#driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)

driver.get('https://pastebin.com/vkuagfwV')
strings = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "textarea"))).text

strings = strings.replace("\n", " ")
driver.close()
    
print(strings)