mechanicalsoup - 如何输入单个文本框

mechanicalsoup - How to input single text box

我要解析的站点只有一个没有表单的输入框。我在定义单个输入框、将地址传递给它然后提交时遇到了问题。

我想做的是输入一个地址,提交,抓取id="A18" title="Click to get bulk trash pick up info"下的信息并加载到JSON.

Python:

import mechanicalsoup

# URL that we authenticate against
map_url = "http://mapservices.phoenix.gov/gis/imap/iMap.html"
address = "<address>"
json_file = "/home/pi/bulk_pickup.json"

# Setup browser
browser = mechanicalsoup.StatefulBrowser(
    soup_config={'features': 'lxml'},
    user_agent='Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13',
)

# Open the login URL
map_page = browser.get(map_url)

# Similar to assert login_page.ok but with full status code in case of failure.
map_page.raise_for_status()

search_form = mechanicalsoup.Form(map_page.soup.select_one('input[id="search_input"]'))

search_form.input({'search_input': address})

不幸的是,http://mapservices.phoenix.gov/gis/imap/iMap.html 页面似乎大量使用了 JavaScript。您看到的 <input ...> 标签甚至不是 <form> 的一部分,MechanicalSoup 需要表单的 action= 属性才能知道将其提交到哪里。要么你需要自己破解低级的东西(但与使用裸机 request 库相比,MechanicalSoup 不会很有帮助),或者你需要更高级的解决方案,如 Selenium。

有关详细信息,请参阅 http://mechanicalsoup.readthedocs.io/en/stable/faq.html#when-to-use-mechanicalsoup

如果页面多 "HTMLy" 而少 "JavaScripty",您可能会写

browser.open(map_url)
browser.select_form(...)
browser["search_input"] = ...
browser.submit_selected()