提交表单后检索 Mechanical Soup 结果
Retrieve Mechanical Soup results after submitting a form
我正在努力从简单的表单提交中检索一些结果。这是我目前所拥有的:
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.set_verbose(2)
url = "https://www.dermcoll.edu.au/find-a-derm/"
browser.open(url)
form = browser.select_form("#find-derm-form")
browser["postcode"] = 3000
browser.submit_selected()
form.print_summary()
这些结果在哪里结束...?
非常感谢
根据 MechanicalSoup FAQ,在处理启用 JavaScript 的动态表单时不应使用此库,您示例中的网站似乎就是这种情况。
相反,您可以使用 Selenium in combination with BeautifulSoup (and a little bit of help from webdriver-manager) 来达到您想要的结果。一个简短的示例如下所示:
from selenium import webdriver
from bs4 import BeautifulSoup
from webdriver_manager.chrome import ChromeDriverManager
# set up the Chrome driver instance using webdriver_manager
driver = webdriver.Chrome(ChromeDriverManager().install())
# navigate to the page
driver.get("https://www.dermcoll.edu.au/find-a-derm/")
# find the postcode input and enter your desired value
postcode_input = driver.find_element_by_name("postcode")
postcode_input.send_keys("3000")
# find the search button and perform the search
search_button = driver.find_element_by_class_name("search-btn.location_derm_search_icon")
search_button.click()
# get all search results and load them into a BeautifulSoup object for parsing
search_results = driver.find_element_by_id("search_result")
search_results = search_results.get_attribute('innerHTML')
search_results = BeautifulSoup(search_results)
# get individual result cards
search_results = search_results.find_all("div", {"class": "address_sec_contents"})
# now you can parse for whatever information you need
[x.find("h4") for x in search_results] # names
[x.find("p", {"class": "qualification"}) for x in search_results] # qualifications
[x.find("address") for x in search_results] # addresses
虽然这种方式可能看起来更复杂,但它更强大并且可以轻松地重新用于 MechanicalSoup 不足的更多情况。
我正在努力从简单的表单提交中检索一些结果。这是我目前所拥有的:
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.set_verbose(2)
url = "https://www.dermcoll.edu.au/find-a-derm/"
browser.open(url)
form = browser.select_form("#find-derm-form")
browser["postcode"] = 3000
browser.submit_selected()
form.print_summary()
这些结果在哪里结束...?
非常感谢
根据 MechanicalSoup FAQ,在处理启用 JavaScript 的动态表单时不应使用此库,您示例中的网站似乎就是这种情况。
相反,您可以使用 Selenium in combination with BeautifulSoup (and a little bit of help from webdriver-manager) 来达到您想要的结果。一个简短的示例如下所示:
from selenium import webdriver
from bs4 import BeautifulSoup
from webdriver_manager.chrome import ChromeDriverManager
# set up the Chrome driver instance using webdriver_manager
driver = webdriver.Chrome(ChromeDriverManager().install())
# navigate to the page
driver.get("https://www.dermcoll.edu.au/find-a-derm/")
# find the postcode input and enter your desired value
postcode_input = driver.find_element_by_name("postcode")
postcode_input.send_keys("3000")
# find the search button and perform the search
search_button = driver.find_element_by_class_name("search-btn.location_derm_search_icon")
search_button.click()
# get all search results and load them into a BeautifulSoup object for parsing
search_results = driver.find_element_by_id("search_result")
search_results = search_results.get_attribute('innerHTML')
search_results = BeautifulSoup(search_results)
# get individual result cards
search_results = search_results.find_all("div", {"class": "address_sec_contents"})
# now you can parse for whatever information you need
[x.find("h4") for x in search_results] # names
[x.find("p", {"class": "qualification"}) for x in search_results] # qualifications
[x.find("address") for x in search_results] # addresses
虽然这种方式可能看起来更复杂,但它更强大并且可以轻松地重新用于 MechanicalSoup 不足的更多情况。