在 Beautiful Soup 或 Selenium 的 <b> 标签内获取数据
Getting data within <b> tag on Beautiful Soup or Selenium
我正在尝试从 this website
的 <b>
标签中提取内容
我想通过输入地址来提取不同城市的内容。
Query Date: Wed Aug 09 2017
Latitude: 33.4484
Longitude: -112.0740
ASCE 7-10 Windspeeds
(3-sec peak gust in mph*):
Risk Category I: 105
Risk Category II: 115
Risk Category III-IV: 120
MRI** 10-Year: 76
MRI** 25-Year: 84
MRI** 50-Year: 90
MRI** 100-Year: 96
ASCE 7-05 Windspeed:
90 (3-sec peak gust in mph)
ASCE 7-93 Windspeed:
72 (fastest mile in mph)
我试过的代码如下。
from bs4 import BeautifulSoup
from datetime import datetime
import dateutil.parser
import urllib2
import requests
import sys
import re
import csv
import pandas as pd
from selenium import webdriver
chrome_path = r"/usr/local/share/chromedriver"
driver = webdriver.Chrome(chrome_path)
driver.get("http://windspeed.atcouncil.org/") # opening the site
driver.find_element_by_xpath(
"""//*[@id="address"]""").click() # click the radio button
driver.find_element_by_xpath("""//*[@id="google-map-address"]""").click() # clicking the textbox
cities = ['pheonix'] # city list
for city in cities:
# print (city)
driver.find_element_by_xpath("""//*[@id="google-map-address"]""").send_keys(city) # passing cities
driver.find_element_by_xpath("""//*[@id="searchform"]/div[1]/div[2]/button""").click()
driver.find_element_by_xpath("""// *[ @ id = "latt"]""")
driver.find_element_by_xpath('//*[@id="searchform"]/div[1]/div[7]/span/input').click()
x = driver.current_url
print x
Data = {'optionCoordinate': '2','coordinate_address': cities}
page = requests.post(x, data = Data)
soup = BeautifulSoup(page.content,'html.parser')
for b_tag in soup.find_all('b'):
print b_tag.text,b_tag.next_sibling
如果可以用 Selenium 和 Python BS4 解决,请帮我找到解决方案。
您可以简单地使用 selenium 提取此数据:
from selenium import webdriver as wd
br = wd.Chrome()
br.get(URL) # use url mentioned in question
s = br.find_element_by_id('bodyContent') #search results div
print '\n'.join(s.text.split('\n')[3:22])
输出:
您可以根据需要处理此字符串数据。
Query Date: Wed Aug 09 2017
Latitude: 33.4484
Longitude: -112.0740
ASCE 7-10 Windspeeds
(3-sec peak gust in mph*):
Risk Category I: 105
Risk Category II: 115
Risk Category III-IV: 120
MRI** 10-Year: 76
MRI** 25-Year: 84
MRI** 50-Year: 90
MRI** 100-Year: 96
ASCE 7-05 Windspeed:
90 (3-sec peak gust in mph)
ASCE 7-93 Windspeed:
72 (fastest mile in mph)
我正在尝试从 this website
的<b>
标签中提取内容
我想通过输入地址来提取不同城市的内容。
Query Date: Wed Aug 09 2017
Latitude: 33.4484
Longitude: -112.0740
ASCE 7-10 Windspeeds
(3-sec peak gust in mph*):
Risk Category I: 105
Risk Category II: 115
Risk Category III-IV: 120
MRI** 10-Year: 76
MRI** 25-Year: 84
MRI** 50-Year: 90
MRI** 100-Year: 96
ASCE 7-05 Windspeed:
90 (3-sec peak gust in mph)
ASCE 7-93 Windspeed:
72 (fastest mile in mph)
我试过的代码如下。
from bs4 import BeautifulSoup
from datetime import datetime
import dateutil.parser
import urllib2
import requests
import sys
import re
import csv
import pandas as pd
from selenium import webdriver
chrome_path = r"/usr/local/share/chromedriver"
driver = webdriver.Chrome(chrome_path)
driver.get("http://windspeed.atcouncil.org/") # opening the site
driver.find_element_by_xpath(
"""//*[@id="address"]""").click() # click the radio button
driver.find_element_by_xpath("""//*[@id="google-map-address"]""").click() # clicking the textbox
cities = ['pheonix'] # city list
for city in cities:
# print (city)
driver.find_element_by_xpath("""//*[@id="google-map-address"]""").send_keys(city) # passing cities
driver.find_element_by_xpath("""//*[@id="searchform"]/div[1]/div[2]/button""").click()
driver.find_element_by_xpath("""// *[ @ id = "latt"]""")
driver.find_element_by_xpath('//*[@id="searchform"]/div[1]/div[7]/span/input').click()
x = driver.current_url
print x
Data = {'optionCoordinate': '2','coordinate_address': cities}
page = requests.post(x, data = Data)
soup = BeautifulSoup(page.content,'html.parser')
for b_tag in soup.find_all('b'):
print b_tag.text,b_tag.next_sibling
如果可以用 Selenium 和 Python BS4 解决,请帮我找到解决方案。
您可以简单地使用 selenium 提取此数据:
from selenium import webdriver as wd
br = wd.Chrome()
br.get(URL) # use url mentioned in question
s = br.find_element_by_id('bodyContent') #search results div
print '\n'.join(s.text.split('\n')[3:22])
输出:
您可以根据需要处理此字符串数据。
Query Date: Wed Aug 09 2017
Latitude: 33.4484
Longitude: -112.0740
ASCE 7-10 Windspeeds
(3-sec peak gust in mph*):
Risk Category I: 105
Risk Category II: 115
Risk Category III-IV: 120
MRI** 10-Year: 76
MRI** 25-Year: 84
MRI** 50-Year: 90
MRI** 100-Year: 96
ASCE 7-05 Windspeed:
90 (3-sec peak gust in mph)
ASCE 7-93 Windspeed:
72 (fastest mile in mph)