使用 python 机械化修复网页的字符编码

Question

我正尝试使用 Mechanize 在此 page 上提交表格。

br.open("http://mspc.bii.a-star.edu.sg/tankp/run_depth.html")
#selecting form to fill
br.select_form(nr = 0)
#input for the form
br['pdb_id'] = '1atp'
req = br.submit()

然而，这给出了以下错误

mechanize._form.ParseError: expected name token at '<! INPUT PDB FILE>\n\t'

我认为这是因为一些错误的字符编码(ref)。我想知道如何解决这个问题。

Answer 1

你的问题是有些问题 HTML comment tags, leading to an invalid website which mechanize's parser can't read. But you can use the included BeautifulSoup parser 相反，这在我的情况下有效（Python 2.7.9，机械化 0.2.5）：

#!/usr/bin/env python
#-*- coding: utf-8 -*-
import mechanize

br = mechanize.Browser(factory=mechanize.RobustFactory())
br.open('http://mspc.bii.a-star.edu.sg/tankp/run_depth.html')
br.select_form(nr=0)
br['pdb_id'] = '1atp'
response = br.submit()

使用 python 机械化修复网页的字符编码

Fix Character encoding of webpage using python Mechanize

python

mechanize