通过 selenium 保持不变 html
Get unaltered html via selenium
我正在使用 python/selenium/headless geckodriver 来抓取页面,但我如何才能获得未更改的 html,因为它是在 JS 开始操纵元素之前下载的?这是我试过的:
fireFoxOptions = webdriver.FirefoxOptions()
fireFoxOptions.headless = True
driver = webdriver.Firefox(options=fireFoxOptions)
driver.get(url)
print(driver.page_source)
这似乎是这样做的方法:
profile = webdriver.FirefoxProfile()
profile.DEFAULT_PREFERENCES['frozen']['javascript.enabled'] = False
profile.set_preference("app.update.auto", False)
profile.set_preference("app.update.enabled", False)
profile.update_preferences()
options = webdriver.FirefoxOptions()
options.profile = profile
options.headless = True
driver = webdriver.Firefox(options=options)
url = 'https://www.somewhere.com/some/path'
driver.get(url)
我正在使用 python/selenium/headless geckodriver 来抓取页面,但我如何才能获得未更改的 html,因为它是在 JS 开始操纵元素之前下载的?这是我试过的:
fireFoxOptions = webdriver.FirefoxOptions()
fireFoxOptions.headless = True
driver = webdriver.Firefox(options=fireFoxOptions)
driver.get(url)
print(driver.page_source)
这似乎是这样做的方法:
profile = webdriver.FirefoxProfile()
profile.DEFAULT_PREFERENCES['frozen']['javascript.enabled'] = False
profile.set_preference("app.update.auto", False)
profile.set_preference("app.update.enabled", False)
profile.update_preferences()
options = webdriver.FirefoxOptions()
options.profile = profile
options.headless = True
driver = webdriver.Firefox(options=options)
url = 'https://www.somewhere.com/some/path'
driver.get(url)