无法使用机械化访问完整网页

can not access complete webpage using mechanize

我试图使用机械化保存 usautoforce 的主页。@Ertugrul 根据你的回答,我有完整的页面。但是当我试图访问用户名和密码字段时,它给出了一个错误。我已经将所有只读设置为假。当我在编辑器中打开网页时,没有 html 指的是用户名和密码 这是我在 mechanize 中的代码,

br = mechanize.Browser()


br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_robots(False)
#br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'), ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),('Upgrade-Insecure-Requests','1'),('Connection','keep-alive')]

br.open("http://www.usautoforce.com/Pages/home.aspx")
br.set_handle_robots(False) 
print br.response
time.sleep(9)

latest_index = 0
html_replaced = ""
html = br.response().read()


for m in re.finditer('(href|src)(=")(/[^"]+")', html):
    html_replaced += html[latest_index:m.start()] + m.groups()[0]+m.groups()[1] + 'http://www.usautoforce.com' + m.groups()[2]
    latest_index = m.end()


f=open("us.html","w")
f.write(html_replaced)
f.close()

print [form for form in br.forms()][0]

br.set_handle_robots(False) 
print br.response
time.sleep(9)
html = br.response().read()

br.select_form(nr=0)
time.sleep(2)

#for control in br.form.controls:
 #   print control
  #  print "type=%s, name=%s value=%s" % (control.type, control.name, br[control.name])

br.form.set_all_readonly(False)
br.form["nexpartuname"] = "abc"

br.form["pwd"] = "xyz"
br.submit()

这里是错误:

  File "haha.py", line 60, in <module>
    br.form["nexpartuname"] = "clack"
  File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 2775, in __setitem__
    control = self.find_control(name)
  File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 3096, in find_control
    return self._find_control(name, type, kind, id, label, predicate, nr)
  File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 3180, in _find_control
    raise ControlNotFoundError("no control matching "+description)
mechanize._form.ControlNotFoundError: no control matching name 'nexpartuname'

Mechanize 不执行 javascript。您尝试访问的网站也显示 'Please enable scripts...'.

由于无法在mechanize中启用js,我个人建议您使用phantomjs。

但这里真正的问题不是 javascript,而是 url。由于该网站中的 url 是相对的,因此当您下载并打开 html 代码时,它的行为并不像预期的那样。

您必须将所有相对网址转换为绝对网址。在将 html 写入文件之前使用此代码。将 html_replaced str 而不是 html str 写入文件。

latest_index = 0
html_replaced = ""

for m in re.finditer('(href|src)(=")(/[^"]+")', html):
    html_replaced += html[latest_index:m.start()] + m.groups()[0]+m.groups()[1] + 'http://www.usautoforce.com' + m.groups()[2]
    latest_index = m.end()