WebDriverException:消息:异常... "Failure" nsresult:“0x80004005(NS_ERROR_FAILURE)”使用 Selenium Python 保存大型 html 文件时
WebDriverException: Message: Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" while saving a large html file using Selenium Python
我正在浏览 Google Play 商店和应用程序的评论,由 URL 指定到应用程序页面。 Selenium 然后找到评论并向下滚动以加载所有评论。滚动部分有效,没有无头选项我可以看到 Selenium 到达站点的末尾。不起作用的是保存 html 内容以供进一步分析。
根据其他答案,我尝试了不同的保存源代码的方法。
innerHTML = DRIVER.execute_script("return document.body.innerHTML")
或
innerHTML = DRIVER.page_source
两者都会导致相同的错误消息和异常。
我的滚动页面和加载所有评论的代码
SCROLL_PAUSE_TIME = 5
options = Options()
options.headless = True
FP = webdriver.FirefoxProfile()
FP.set_preference("intl.accept_languages", "de")
for url in START_URLS:
try:
DRIVER = webdriver.Firefox(options=options, firefox_profile=FP)
DRIVER.get(url)
time.sleep(SCROLL_PAUSE_TIME)
app_name = DRIVER.find_element_by_xpath('//h1[@itemprop="name"]').get_attribute('innerText')
all_reviews_button = DRIVER.find_element_by_xpath('//span[text()="Alle Bewertungen lesen"]')
all_reviews_button.click()
time.sleep(SCROLL_PAUSE_TIME)
last_height = DRIVER.execute_script("return document.body.scrollHeight")
while True:
DRIVER.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
DRIVER.find_element_by_xpath('//span[text()="Mehr anzeigen"]').click()
except:
pass
time.sleep(SCROLL_PAUSE_TIME)
new_height = DRIVER.execute_script("return document.body.scrollHeight")
if new_height == last_height:
logger.info('Durchlauf erfolgreich')
innerHTML = DRIVER.execute_script("return document.body.innerHTML")
with open(app_name +'.html','w', encoding='utf-8') as out:
out.write(html)
break
last_height = new_height
except Exception as e:
logger.error('Exception occurred', exc_info=True)
finally:
DRIVER.quit()
日志文件,显示无限滚动到达页面末尾但无法保存文件
10.09.19 16:12:00 - INFO - Durchlauf erfolgreich
10.09.19 16:12:13 - ERROR - Exception occurred
Traceback (most recent call last):
File "scraper.py", line 57, in <module>
innerHTML = DRIVER.execute_script("return document.body.innerHTML")
File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 636, in execute_script
'args': converted_args})['value']
File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275" data: no]
geckodriver.log
的最后一部分
...
1568124670155 Marionette WARN TimedPromise timed out after 500 ms: stacktrace:
bail@chrome://marionette/content/sync.js:223:64
1568124693017 Marionette WARN TimedPromise timed out after 500 ms: stacktrace:
bail@chrome://marionette/content/sync.js:223:64
1568124734637 Marionette INFO Stopped listening on port 57015
[Parent 14684, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Child 10464, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Parent 14684, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
JavaScript error: resource:///modules/sessionstore/SessionStore.jsm, line 1639: TypeError: subject.QueryInterface is not a function
A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[Child 2508, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Child]
我想将该页面另存为文件,并在下一步中解析 html 以提取评论。但是,保存部分不适用于大页面。如果我在说 100 步后退出 while 循环并保存页面,它工作正常。
NS_ERROR_FAILURE (0x80004005)
这是所有错误中的一般错误,并且出现在更具体的错误代码不适用的所有错误中。
但是这个错误信息...
selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275" data: no]
...暗示 Marionette 在尝试 read/store/copy page_source()
.
时抛出错误
相关的 HTML DOM 可以帮助我们更好地调试问题。然而,问题似乎在于 page_source()
是 emencely huge/large,它超过了 Marionette 可以处理的最大值。可能您要处理的 string
更大。
解决方案
一个快速的解决方案是避免将 page_source()
传递给变量并打印它以找出实际问题所在。
print(DRIVER.execute_script("return document.body.innerHTML"))
或
print(DRIVER.page_source)
参考
您可以在以下位置找到一些相关讨论:
结尾
文档链接:
- WebDriver:TakeScreenshot generates error when web page has a big height
- WebDriver:TakeScreenshot fails in canvas "scale()" for huge web pages
- Exception NS_ERROR_FAILURE in ctx.scale() if width or height is greater than 32767
- event.synthesizeMouseAtPoint() should only call nsIDOMWindowUtils.sendMouseEvent() if there is a valid window handle
我正在浏览 Google Play 商店和应用程序的评论,由 URL 指定到应用程序页面。 Selenium 然后找到评论并向下滚动以加载所有评论。滚动部分有效,没有无头选项我可以看到 Selenium 到达站点的末尾。不起作用的是保存 html 内容以供进一步分析。
根据其他答案,我尝试了不同的保存源代码的方法。
innerHTML = DRIVER.execute_script("return document.body.innerHTML")
或
innerHTML = DRIVER.page_source
两者都会导致相同的错误消息和异常。
我的滚动页面和加载所有评论的代码
SCROLL_PAUSE_TIME = 5
options = Options()
options.headless = True
FP = webdriver.FirefoxProfile()
FP.set_preference("intl.accept_languages", "de")
for url in START_URLS:
try:
DRIVER = webdriver.Firefox(options=options, firefox_profile=FP)
DRIVER.get(url)
time.sleep(SCROLL_PAUSE_TIME)
app_name = DRIVER.find_element_by_xpath('//h1[@itemprop="name"]').get_attribute('innerText')
all_reviews_button = DRIVER.find_element_by_xpath('//span[text()="Alle Bewertungen lesen"]')
all_reviews_button.click()
time.sleep(SCROLL_PAUSE_TIME)
last_height = DRIVER.execute_script("return document.body.scrollHeight")
while True:
DRIVER.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
DRIVER.find_element_by_xpath('//span[text()="Mehr anzeigen"]').click()
except:
pass
time.sleep(SCROLL_PAUSE_TIME)
new_height = DRIVER.execute_script("return document.body.scrollHeight")
if new_height == last_height:
logger.info('Durchlauf erfolgreich')
innerHTML = DRIVER.execute_script("return document.body.innerHTML")
with open(app_name +'.html','w', encoding='utf-8') as out:
out.write(html)
break
last_height = new_height
except Exception as e:
logger.error('Exception occurred', exc_info=True)
finally:
DRIVER.quit()
日志文件,显示无限滚动到达页面末尾但无法保存文件
10.09.19 16:12:00 - INFO - Durchlauf erfolgreich
10.09.19 16:12:13 - ERROR - Exception occurred
Traceback (most recent call last):
File "scraper.py", line 57, in <module>
innerHTML = DRIVER.execute_script("return document.body.innerHTML")
File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 636, in execute_script
'args': converted_args})['value']
File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275" data: no]
geckodriver.log
的最后一部分...
1568124670155 Marionette WARN TimedPromise timed out after 500 ms: stacktrace:
bail@chrome://marionette/content/sync.js:223:64
1568124693017 Marionette WARN TimedPromise timed out after 500 ms: stacktrace:
bail@chrome://marionette/content/sync.js:223:64
1568124734637 Marionette INFO Stopped listening on port 57015
[Parent 14684, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Child 10464, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Parent 14684, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
JavaScript error: resource:///modules/sessionstore/SessionStore.jsm, line 1639: TypeError: subject.QueryInterface is not a function
A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[Child 2508, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Child]
我想将该页面另存为文件,并在下一步中解析 html 以提取评论。但是,保存部分不适用于大页面。如果我在说 100 步后退出 while 循环并保存页面,它工作正常。
NS_ERROR_FAILURE (0x80004005)
这是所有错误中的一般错误,并且出现在更具体的错误代码不适用的所有错误中。
但是这个错误信息...
selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275" data: no]
...暗示 Marionette 在尝试 read/store/copy page_source()
.
相关的 HTML DOM 可以帮助我们更好地调试问题。然而,问题似乎在于 page_source()
是 emencely huge/large,它超过了 Marionette 可以处理的最大值。可能您要处理的 string
更大。
解决方案
一个快速的解决方案是避免将 page_source()
传递给变量并打印它以找出实际问题所在。
print(DRIVER.execute_script("return document.body.innerHTML"))
或
print(DRIVER.page_source)
参考
您可以在以下位置找到一些相关讨论:
结尾
文档链接:
- WebDriver:TakeScreenshot generates error when web page has a big height
- WebDriver:TakeScreenshot fails in canvas "scale()" for huge web pages
- Exception NS_ERROR_FAILURE in ctx.scale() if width or height is greater than 32767
- event.synthesizeMouseAtPoint() should only call nsIDOMWindowUtils.sendMouseEvent() if there is a valid window handle