如何使网页停止加载并从中提取文本
How to make a webpage stop loading and extract text from it
我想使用以下代码从 url-shortner 中提取文本:
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
os.environ['PATH'] += 'C:/Selenium Drivers'
driver = webdriver.Chrome()
driver.implicitly_wait(10)
driver.get('https://pastebin.com/vkuagfwV')
strings = str(driver.find_element(By.CLASS_NAME, 'textarea').text)
strings = strings.replace("\n", " ")
driver.close()
print(strings)
但是在我手动阻止网页停止加载之前,此代码无法正常工作。我也尝试使用 XPATH,但它没有用。
尝试在此处使用预期条件 visibility_of_element_located
方法,而不是 implicitly_wait
。
另外如评论中所述,您不需要在那里使用 str
强制转换。
请试试这个:
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
os.environ['PATH'] += 'C:/Selenium Drivers'
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
driver.get('https://pastebin.com/vkuagfwV')
strings = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "textarea"))).text
strings = strings.replace("\n", " ")
driver.close()
print(strings)
UPD
请添加eager
pageLoadStrategy
配置
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
os.environ['PATH'] += 'C:/Selenium Drivers'
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "eager"
driver = webdriver.Chrome(desired_capabilities=caps, executable_path=r'C:\path\to\chromedriver.exe')
#driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
driver.get('https://pastebin.com/vkuagfwV')
strings = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "textarea"))).text
strings = strings.replace("\n", " ")
driver.close()
print(strings)
我想使用以下代码从 url-shortner 中提取文本:
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
os.environ['PATH'] += 'C:/Selenium Drivers'
driver = webdriver.Chrome()
driver.implicitly_wait(10)
driver.get('https://pastebin.com/vkuagfwV')
strings = str(driver.find_element(By.CLASS_NAME, 'textarea').text)
strings = strings.replace("\n", " ")
driver.close()
print(strings)
但是在我手动阻止网页停止加载之前,此代码无法正常工作。我也尝试使用 XPATH,但它没有用。
尝试在此处使用预期条件 visibility_of_element_located
方法,而不是 implicitly_wait
。
另外如评论中所述,您不需要在那里使用 str
强制转换。
请试试这个:
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
os.environ['PATH'] += 'C:/Selenium Drivers'
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
driver.get('https://pastebin.com/vkuagfwV')
strings = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "textarea"))).text
strings = strings.replace("\n", " ")
driver.close()
print(strings)
UPD
请添加eager
pageLoadStrategy
配置
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
os.environ['PATH'] += 'C:/Selenium Drivers'
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "eager"
driver = webdriver.Chrome(desired_capabilities=caps, executable_path=r'C:\path\to\chromedriver.exe')
#driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
driver.get('https://pastebin.com/vkuagfwV')
strings = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "textarea"))).text
strings = strings.replace("\n", " ")
driver.close()
print(strings)