如何使用Python获取没有出现在HTML,但出现在Chrome的"Inspect Element"工具中的元素?
How to use Python to get elements that do not appear in HTML, but appear in "Inspect Element" tool of Chrome?
亲爱的 Python 专家们!
我完全不熟悉 Python 并编写了一个小程序来从网页中获取信息。没有什么可问的,如果页面会return page-source HTML中的所有信息,可以很容易地通过Chrome查看。问题是我提交IP地址到https://www.maxmind.com/en/geoip-demo后想得到的Elements并没有出现在HTML的正文中,而是只有当我点击[=30=的"inspect element"工具时才会出现].我曾经按照代码 post 到页面并打印响应字符串,但是我想要的元素不在那里。
import urllib2
import requests
url = 'https://www.maxmind.com/en/geoip-demo'
data = {'addresses':'162.237.72.200'}
post = requests.post(url, data = data)
content = post.content
print content
通过这段代码,我希望得到HTML正文中IP地址相关的一些信息,比如
162.237.72.200
US
Pittsburg,California,United States,North America
94565
38.0051,
-121.8387
AT&T U-verse
AT&T U-verse
sbcglobal.net
807
但是HTML正文中没有这些信息,所以如果有人能给我一个解决问题的提示,我真的很感激。非常感谢!
一个模拟浏览器导航和与表单交互的工作解决方案,以使用 scrapy 和 webdriver 检索数据。
class MaxSpider(CrawlSpider):
name = "max"
allowed_domains = ["maxmind.com"]
start_urls = ["https://www.maxmind.com/en/geoip-demo"]
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
button = self.driver.find_element_by_id('addresses')
login_form = self.driver.find_element_by_id('addresses')
actions = ActionChains(self.driver)
actions.click(login_form)
actions.perform()
login_form.send_keys("62.237.72.200")
submit = self.driver.find_element_by_xpath('//*[@id="geoip-demo-form"]/button')
actions.click(submit)
time.sleep(3)
for element in self.driver.find_elements_by_id('geoip-demo-results-tbody'):
print element.text
self.driver.close()
输出摘录:
2015-01-13 13:27:18+0100 [max] DEBUG: Crawled (200) https://www.maxmind.com/en/geoip-demo> (referer: http://www.bing.com)
62.237.72.200 FI Finland, Europe 60.1708,
24.9375 Tele Danmark Tele Danmark
亲爱的 Python 专家们! 我完全不熟悉 Python 并编写了一个小程序来从网页中获取信息。没有什么可问的,如果页面会return page-source HTML中的所有信息,可以很容易地通过Chrome查看。问题是我提交IP地址到https://www.maxmind.com/en/geoip-demo后想得到的Elements并没有出现在HTML的正文中,而是只有当我点击[=30=的"inspect element"工具时才会出现].我曾经按照代码 post 到页面并打印响应字符串,但是我想要的元素不在那里。
import urllib2
import requests
url = 'https://www.maxmind.com/en/geoip-demo'
data = {'addresses':'162.237.72.200'}
post = requests.post(url, data = data)
content = post.content
print content
通过这段代码,我希望得到HTML正文中IP地址相关的一些信息,比如
162.237.72.200
US
Pittsburg,California,United States,North America
94565
38.0051,
-121.8387
AT&T U-verse
AT&T U-verse
sbcglobal.net
807
但是HTML正文中没有这些信息,所以如果有人能给我一个解决问题的提示,我真的很感激。非常感谢!
一个模拟浏览器导航和与表单交互的工作解决方案,以使用 scrapy 和 webdriver 检索数据。
class MaxSpider(CrawlSpider):
name = "max"
allowed_domains = ["maxmind.com"]
start_urls = ["https://www.maxmind.com/en/geoip-demo"]
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
button = self.driver.find_element_by_id('addresses')
login_form = self.driver.find_element_by_id('addresses')
actions = ActionChains(self.driver)
actions.click(login_form)
actions.perform()
login_form.send_keys("62.237.72.200")
submit = self.driver.find_element_by_xpath('//*[@id="geoip-demo-form"]/button')
actions.click(submit)
time.sleep(3)
for element in self.driver.find_elements_by_id('geoip-demo-results-tbody'):
print element.text
self.driver.close()
输出摘录:
2015-01-13 13:27:18+0100 [max] DEBUG: Crawled (200) https://www.maxmind.com/en/geoip-demo> (referer: http://www.bing.com)
62.237.72.200 FI Finland, Europe 60.1708, 24.9375 Tele Danmark Tele Danmark