尝试使用 mechanize 打开网页时出错
Error when trying to open a webpage with mechanize
我正在尝试学习机械化以稍后创建聊天记录机器人,所以我测试了一些基本代码
import mechanize as mek
import re
br = mek.Browser()
br.open("google.com")
但是,每当我 运行 它时,我都会收到此错误。
Traceback (most recent call last):
File "/home/runner/.local/share/virtualenvs/python3/lib/python3.7/site-packages/mechanize/_mechanize.py", line 262, in _mech_open
url.get_full_url
AttributeError: 'str' object has no attribute 'get_full_url'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 5, in <module>
br.open("google.com")
File "/home/runner/.local/share/virtualenvs/python3/lib/python3.7/site-packages/mechanize/_mechanize.py", line 253, in open
return self._mech_open(url_or_request, data, timeout=timeout)
File "/home/runner/.local/share/virtualenvs/python3/lib/python3.7/site-packages/mechanize/_mechanize.py", line 269, in _mech_open
raise BrowserStateError("can't fetch relative reference: "
mechanize._mechanize.BrowserStateError: can't fetch relative reference: not viewing any document
我仔细检查了机械化页面上的文档,它似乎是一致的。我做错了什么?
您必须使用架构,否则 mechanize 会认为您正在尝试打开 local/relative 路径(如错误提示的那样)。
br.open("google.com")
应该是 br.open("http://google.com")
.
然后你会看到一个错误mechanize._response.httperror_seek_wrapper: HTTP Error 403: b'request disallowed by robots.txt'
,因为google.com
不允许爬虫。这可以在 open
.
之前使用 br.set_handle_robots(False)
来补救
我正在尝试学习机械化以稍后创建聊天记录机器人,所以我测试了一些基本代码
import mechanize as mek
import re
br = mek.Browser()
br.open("google.com")
但是,每当我 运行 它时,我都会收到此错误。
Traceback (most recent call last):
File "/home/runner/.local/share/virtualenvs/python3/lib/python3.7/site-packages/mechanize/_mechanize.py", line 262, in _mech_open
url.get_full_url
AttributeError: 'str' object has no attribute 'get_full_url'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 5, in <module>
br.open("google.com")
File "/home/runner/.local/share/virtualenvs/python3/lib/python3.7/site-packages/mechanize/_mechanize.py", line 253, in open
return self._mech_open(url_or_request, data, timeout=timeout)
File "/home/runner/.local/share/virtualenvs/python3/lib/python3.7/site-packages/mechanize/_mechanize.py", line 269, in _mech_open
raise BrowserStateError("can't fetch relative reference: "
mechanize._mechanize.BrowserStateError: can't fetch relative reference: not viewing any document
我仔细检查了机械化页面上的文档,它似乎是一致的。我做错了什么?
您必须使用架构,否则 mechanize 会认为您正在尝试打开 local/relative 路径(如错误提示的那样)。
br.open("google.com")
应该是 br.open("http://google.com")
.
然后你会看到一个错误mechanize._response.httperror_seek_wrapper: HTTP Error 403: b'request disallowed by robots.txt'
,因为google.com
不允许爬虫。这可以在 open
.
br.set_handle_robots(False)
来补救