I'm getting a recursive error [RuntimeError: maximum recursion depth exceeded while calling a Python object] - but my code is iterative -- or is it?
I'm getting a recursive error [RuntimeError: maximum recursion depth exceeded while calling a Python object] - but my code is iterative -- or is it?
我遇到递归错误:
RuntimeError: maximum recursion depth exceeded while calling a Python
object
但是我的代码是迭代的……是吗?我认为它是基于文档(这里,例如:http://www.pythonlearn.com/html-008/cfbook006.html). I've been reading on how to change an algorithm/code from recursive to iterative (e.g., http://blog.moertel.com/posts/2013-05-11-recursive-to-iterative.html)但我只是不明白它是如何递归的。
此代码转到一个网站,进行搜索并 returns 大约 122 页的结果。然后单击每个结果页面并收集 links。然后它意味着点击每个 link 并从每个 text/html 中刮掉。
代码运行得很漂亮,直到它到达最终的 for 循环:for url in article_urls:
。它将在 returns 错误之前捕获并存储(在保管箱中)超过 200 个 shtml 页面。
我要解决的难题是:如何避免出现此错误?
代码如下:
</p>
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
def isReady(browser):
return browser.execute_script("return document.readyState") == "complete"
def waitUntilReady(browser):
if not isReady(browser):
waitUntilReady(browser)
browser = webdriver.Firefox()
browser.get('http://www.usprwire.com/cgi-bin/news/search.cgi')
# make a search
query = WebDriverWait(browser, 60).until(EC.presence_of_element_located((By.NAME, "query")))
query.send_keys('"test"')
submit = browser.find_element_by_xpath("//input[@value='Search']")
submit.click()
numarticles = 0
# grab article urls
npages = 1
article_urls = []
for page in range(1, npages + 1):
article_urls += [elm.get_attribute("href") for elm in browser.find_elements_by_class_name('category_links')]
if page <= 121: #click to the next page
browser.find_element_by_link_text('[>>]').click()
if page == 122: #last page in search results, so no '[>>]'' to click on. Move on to next steps.
continue
# iterate over urls and save the HTML source
for url in article_urls:
browser.get(url)
waitUntilReady(browser)
numarticles = numarticles+1
title = browser.current_url.split("/")[-1]
with open('/Users/My/Dropbox/File/Place/'+str(numarticles)+str(title), 'w') as fw:
fw.write(browser.page_source.encode('utf-8'))
非常感谢您的任何意见。
waitUntilReady 是一个递归函数!它可能会被调用很多次,尤其是当你的连接速度很慢时。
这是一个可能的解决方法:
def waitUntilReady():
while not isReady():
time.sleep(10)
显然,您的 waitUntilReady
进入无限递归,调用自身。
你应该把它改成这样:
while not isReady(browser):
time.sleep(1)
在 Selenium 中等待页面完全加载并不像看起来那么明显,您可以在 Harry J.W. Percival's article
中阅读更多内容
我遇到递归错误:
RuntimeError: maximum recursion depth exceeded while calling a Python object
但是我的代码是迭代的……是吗?我认为它是基于文档(这里,例如:http://www.pythonlearn.com/html-008/cfbook006.html). I've been reading on how to change an algorithm/code from recursive to iterative (e.g., http://blog.moertel.com/posts/2013-05-11-recursive-to-iterative.html)但我只是不明白它是如何递归的。
此代码转到一个网站,进行搜索并 returns 大约 122 页的结果。然后单击每个结果页面并收集 links。然后它意味着点击每个 link 并从每个 text/html 中刮掉。
代码运行得很漂亮,直到它到达最终的 for 循环:for url in article_urls:
。它将在 returns 错误之前捕获并存储(在保管箱中)超过 200 个 shtml 页面。
我要解决的难题是:如何避免出现此错误?
代码如下:
</p>
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
def isReady(browser):
return browser.execute_script("return document.readyState") == "complete"
def waitUntilReady(browser):
if not isReady(browser):
waitUntilReady(browser)
browser = webdriver.Firefox()
browser.get('http://www.usprwire.com/cgi-bin/news/search.cgi')
# make a search
query = WebDriverWait(browser, 60).until(EC.presence_of_element_located((By.NAME, "query")))
query.send_keys('"test"')
submit = browser.find_element_by_xpath("//input[@value='Search']")
submit.click()
numarticles = 0
# grab article urls
npages = 1
article_urls = []
for page in range(1, npages + 1):
article_urls += [elm.get_attribute("href") for elm in browser.find_elements_by_class_name('category_links')]
if page <= 121: #click to the next page
browser.find_element_by_link_text('[>>]').click()
if page == 122: #last page in search results, so no '[>>]'' to click on. Move on to next steps.
continue
# iterate over urls and save the HTML source
for url in article_urls:
browser.get(url)
waitUntilReady(browser)
numarticles = numarticles+1
title = browser.current_url.split("/")[-1]
with open('/Users/My/Dropbox/File/Place/'+str(numarticles)+str(title), 'w') as fw:
fw.write(browser.page_source.encode('utf-8'))
非常感谢您的任何意见。
waitUntilReady 是一个递归函数!它可能会被调用很多次,尤其是当你的连接速度很慢时。
这是一个可能的解决方法:
def waitUntilReady():
while not isReady():
time.sleep(10)
显然,您的 waitUntilReady
进入无限递归,调用自身。
你应该把它改成这样:
while not isReady(browser):
time.sleep(1)
在 Selenium 中等待页面完全加载并不像看起来那么明显,您可以在 Harry J.W. Percival's article
中阅读更多内容