Python & Selenium: Iterate through list of WebElements Error: StaleElementReferenceException
Python & Selenium: Iterate through list of WebElements Error: StaleElementReferenceException
下午好,
对 Python 和网络抓取有些陌生,如有任何帮助,我们将不胜感激!第一:
代码
from selenium import webdriver
import time
chrome_path = r"/Users/ENTER/Desktop/chromedriver"
driver = webdriver.Chrome(chrome_path)
site_url = 'https://www.home-school.com/groups/'
driver.get(site_url)
# get state links from sidebar and store to list
area = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div""")
items = area.find_elements_by_tag_name('a')
# remove unneeded links
del items[:22]
del items[-1:]
#
for links in items:
# print(links.text)
print(links.get_attribute("href"))
# add link related logic here
links.click()
# you have to wait for the next element to display
time.sleep(4)
# assign html container with desired data to variable
element = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[4]/div""")
# Store container text in variable. We skip the first 5 lines of text as they
# are unnecessary.
orgdata = element.text.split("\n",5)[5]
orgdata = orgdata.replace(' Edit Remove More', '').replace(' Edit Remove', '')
# Write data to text file
filepath = '/Users/ENTER/Documents/STEMBoard/Tiger Team/Lingo/' + links.text + '.txt'
file_object = open(filepath, 'a')
file_object.write(orgdata)
问题
我正在使用 Selenium 试图将家庭学校团体的名称和信息从 http://home-school.com/groups/ 保存到每个州的单独文本文件中。
为此,我保存了 link 的列表,并希望遍历列表以单击每个 link,执行与抓取所需数据、操作文本相关的任务,并根据状态输出到单独的文本文件。
我在尝试执行 "for" 循环时收到 StaleElementReferenceException: stale element reference: element is not attached to the page document
。
我相信它在到达 element = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div""")
时给出了错误。据我所知,这个 xpath 没有改变。我假设我需要让 webdriver 等待页面加载,因此 time.sleep(4)
。
我确信这是一个简单的修复程序,当我看到它时会很有意义,但此刻我感到很困惑。你们能提供的任何帮助都会很棒!谢谢!
试一试
from selenium import webdriver
import time
chrome_path = r"/Users/ENTER/Desktop/chromedriver"
driver = webdriver.Chrome(chrome_path)
site_url = 'https://www.home-school.com/groups/'
driver.get(site_url)
# get state links from sidebar and store to list
area = driver.find_element_by_xpath("/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div")
items = area.find_elements_by_tag_name('a')
# remove unneeded links
del items[:22]
del items[-1:]
text_list = [i.text for i in items]
items = [i.get_attribute("href") for i in items]
for i in range(len(items)):
driver.get(items[i])
# you have to wait for the next element to display
time.sleep(2)
# assign html container with desired data to variable
element = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div""")
# Store container text in variable. We skip the first 5 lines of text as they
# are unnecessary.
orgdata = element.text.split("\n",5)[5]
orgdata = orgdata.replace(' Edit Remove More', '').replace(' Edit Remove', '')
# Write data to text file
filepath = '/Users/ENTER/Documents/STEMBoard/Tiger Team/Lingo/' + text_list[i] + '.txt'
file_object = open(filepath, 'a')
file_object.write(orgdata)
file_object.close()
下午好,
对 Python 和网络抓取有些陌生,如有任何帮助,我们将不胜感激!第一:
代码
from selenium import webdriver
import time
chrome_path = r"/Users/ENTER/Desktop/chromedriver"
driver = webdriver.Chrome(chrome_path)
site_url = 'https://www.home-school.com/groups/'
driver.get(site_url)
# get state links from sidebar and store to list
area = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div""")
items = area.find_elements_by_tag_name('a')
# remove unneeded links
del items[:22]
del items[-1:]
#
for links in items:
# print(links.text)
print(links.get_attribute("href"))
# add link related logic here
links.click()
# you have to wait for the next element to display
time.sleep(4)
# assign html container with desired data to variable
element = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[4]/div""")
# Store container text in variable. We skip the first 5 lines of text as they
# are unnecessary.
orgdata = element.text.split("\n",5)[5]
orgdata = orgdata.replace(' Edit Remove More', '').replace(' Edit Remove', '')
# Write data to text file
filepath = '/Users/ENTER/Documents/STEMBoard/Tiger Team/Lingo/' + links.text + '.txt'
file_object = open(filepath, 'a')
file_object.write(orgdata)
问题
我正在使用 Selenium 试图将家庭学校团体的名称和信息从 http://home-school.com/groups/ 保存到每个州的单独文本文件中。
为此,我保存了 link 的列表,并希望遍历列表以单击每个 link,执行与抓取所需数据、操作文本相关的任务,并根据状态输出到单独的文本文件。
我在尝试执行 "for" 循环时收到 StaleElementReferenceException: stale element reference: element is not attached to the page document
。
我相信它在到达 element = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div""")
时给出了错误。据我所知,这个 xpath 没有改变。我假设我需要让 webdriver 等待页面加载,因此 time.sleep(4)
。
我确信这是一个简单的修复程序,当我看到它时会很有意义,但此刻我感到很困惑。你们能提供的任何帮助都会很棒!谢谢!
试一试
from selenium import webdriver
import time
chrome_path = r"/Users/ENTER/Desktop/chromedriver"
driver = webdriver.Chrome(chrome_path)
site_url = 'https://www.home-school.com/groups/'
driver.get(site_url)
# get state links from sidebar and store to list
area = driver.find_element_by_xpath("/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div")
items = area.find_elements_by_tag_name('a')
# remove unneeded links
del items[:22]
del items[-1:]
text_list = [i.text for i in items]
items = [i.get_attribute("href") for i in items]
for i in range(len(items)):
driver.get(items[i])
# you have to wait for the next element to display
time.sleep(2)
# assign html container with desired data to variable
element = driver.find_element_by_xpath("""/html/body/center/table/tbody/tr/td/table[3]/tbody/tr/td[2]/div""")
# Store container text in variable. We skip the first 5 lines of text as they
# are unnecessary.
orgdata = element.text.split("\n",5)[5]
orgdata = orgdata.replace(' Edit Remove More', '').replace(' Edit Remove', '')
# Write data to text file
filepath = '/Users/ENTER/Documents/STEMBoard/Tiger Team/Lingo/' + text_list[i] + '.txt'
file_object = open(filepath, 'a')
file_object.write(orgdata)
file_object.close()