使用分页抓取 table 数据
Scrape table data with pagination
尝试用 Selenium 在有分页的地方抓取 table。试图抓取的网站在 URL 中没有分页。
table = '//*[@id="result-tables"]/div[2]/div[2]/div/table/tbody'
home = driver.find_elements(By.XPATH, '//tbody/tr/td[5]')
away = driver.find_elements(By.XPATH, '//tbody/tr/td[7]')
teams = []
page = 0
while page < 10:
page+=1
time.sleep(5)
for i in range(len(home)):
temp_data = home[i].text + '\n' + away[i].text
pair = teams.append(temp_data)
next_page = driver.find_element(By.XPATH, '//*[@id="result-tables"]/div[3]/ul/li[12]/a/span').click()
teams = []
只存储第一页的数据。当脚本移动到另一个页面时,得到这个错误
Traceback (most recent call last):
File "C:\Users\XXX\OneDrive\Documents\A\b\s_pc.py", line 49, in <module>
temp_data = home[i].text + '\n' + away[i].text
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 76, in text
return self._execute(Command.GET_ELEMENT_TEXT)['value']
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 693, in _execute
return self._parent.execute(command, params)
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 418, in execute
self.error_handler.check_response(response)
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=96.0.4664.45)
Stacktrace:
已在 while loop
中定义了 home
和 away
元素。并且还在 while 循环的开头移动了 time.sleep()
。而且代码没有抛出任何错误。
检查这是否按预期工作。
table = '//*[@id="result-tables"]/div[2]/div[2]/div/table/tbody'
teams = []
page = 0
while page < 10:
time.sleep(5)
home = driver.find_elements(By.XPATH, '//tbody/tr/td[5]')
away = driver.find_elements(By.XPATH, '//tbody/tr/td[7]')
page+=1
for i in range(len(home)):
temp_data = home[i].text + '\n' + away[i].text
pair = teams.append(temp_data)
next_page = driver.find_element(By.XPATH, '//*[@id="result-tables"]/div[3]/ul/li[12]/a/span').click()
尝试用 Selenium 在有分页的地方抓取 table。试图抓取的网站在 URL 中没有分页。
table = '//*[@id="result-tables"]/div[2]/div[2]/div/table/tbody'
home = driver.find_elements(By.XPATH, '//tbody/tr/td[5]')
away = driver.find_elements(By.XPATH, '//tbody/tr/td[7]')
teams = []
page = 0
while page < 10:
page+=1
time.sleep(5)
for i in range(len(home)):
temp_data = home[i].text + '\n' + away[i].text
pair = teams.append(temp_data)
next_page = driver.find_element(By.XPATH, '//*[@id="result-tables"]/div[3]/ul/li[12]/a/span').click()
teams = []
只存储第一页的数据。当脚本移动到另一个页面时,得到这个错误
Traceback (most recent call last):
File "C:\Users\XXX\OneDrive\Documents\A\b\s_pc.py", line 49, in <module>
temp_data = home[i].text + '\n' + away[i].text
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 76, in text
return self._execute(Command.GET_ELEMENT_TEXT)['value']
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py", line 693, in _execute
return self._parent.execute(command, params)
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 418, in execute
self.error_handler.check_response(response)
File "C:\Users\XXX\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=96.0.4664.45)
Stacktrace:
已在 while loop
中定义了 home
和 away
元素。并且还在 while 循环的开头移动了 time.sleep()
。而且代码没有抛出任何错误。
检查这是否按预期工作。
table = '//*[@id="result-tables"]/div[2]/div[2]/div/table/tbody'
teams = []
page = 0
while page < 10:
time.sleep(5)
home = driver.find_elements(By.XPATH, '//tbody/tr/td[5]')
away = driver.find_elements(By.XPATH, '//tbody/tr/td[7]')
page+=1
for i in range(len(home)):
temp_data = home[i].text + '\n' + away[i].text
pair = teams.append(temp_data)
next_page = driver.find_element(By.XPATH, '//*[@id="result-tables"]/div[3]/ul/li[12]/a/span').click()