python beatifulsoup returns 只有键 {} 不是值
python beatifulsoup returns only key {} not a value
嗨,我今天花了几个小时从这个网站抓取一些数据:http://www.buienradar.nl/weer/kingston/jm/3489854/5daagse
我尝试获取橙色框内的数据。weather data
我在 python 3 并使用 bs4
无论我尝试什么,我都只会得到类似键的结果,例如 {temperature}
我如何获得价值?
from bs4 import BeautifulSoup
import requests
url = "http://www.buienradar.nl/weer/kingston/jm/3489854/5daagse"
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
letters = soup.find_all("div", class_="forecast")
tempe = soup.find(class_='temperature').attrs
print(tempe)
table = soup.find(class_='precipitation').attrs
print(table)
heds = soup.find_all('table')
for h in heds:
m = h.find_all('td')
print(m)
for o in m:
print(o.text)
结果是:
{'class': ['temperature']}
{'class': ['precipitation']}
[<td>{time}</td>, <td><img data-url="/resources/images/icons/weather/30x30/{iconcode}.png" src=""/></td>, <td><span class="temperature">{temperature}°C</span></td>, <td>{feeltemperature}°C</td>, <td>{winddirection} {beaufort}</td>, <td style="text-align:left;"><img data-url="/resources/images/icons/wind/{winddirection}.png" src="" style="width:20px;"/></td>, <td class="precipitation">{precipation}%</td>, <td>{precipationmm} mm</td>, <td>{sunshine}%</td>]
{time}
{temperature}°C
{feeltemperature}°C
{winddirection} {beaufort}
{precipation}%
{precipationmm} mm
{sunshine}%
Process finished with exit code 0
我做错了什么?提前致谢。
编辑感谢我得到的答案 运行:
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
import requests
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from time import sleep
driver = webdriver.Firefox(executable_path=r'path/to/selenium')
url = "http://www.buienradar.nl/weer/kingston/jm/3489854/5daagse"
driver.get(url)
WebDriverWait(driver, 10).until( EC.visibility_of_element_located((By.CLASS_NAME, "forecast")))
print("access")
sleep(1)
html_page = driver.page_source
driver.quit()
soup = BeautifulSoup(html_page, "lxml")
letters = soup.find_all("div", class_="forecast")
tempe = soup.find(class_='temperature').attrs
print(tempe)
table = soup.find(class_='precipitation').attrs
print(table)
heds = soup.find_all('table')
for h in heds:
m = h.find_all('td')
print(m)
for o in m:
print(o.text)
您没有做错任何事情,您只是没有做浏览器所做的一切。特别是当您获取 URL 时,该网站仅提供 "template" 并且他们依赖 Javascript 来填充模板值。如果您在 Chrome 中打开 "Networking" 选项卡,您将看到一堆请求。具体来说,https://static.buienradar.nl/resources/js/v/1.0.22/buienradar.min.js 将执行一系列替换,包括 {temperature} 和 {feeltemperature}。
如果你正在寻找像温度这样的东西,你会这样做:
temp = soup.findAll('span',{'class':'temperature'})
#It's not spelled correctly make sure you take that into account
这里的issue好像不是code,温度是由[=26产生的=] 或其他东西。 它是动态,因此,您必须使用类似 Selenium(自动浏览器)的东西
如果您直接向站点 api:
发出请求,则无需 Selenium
即可更快地获得所需数据
import requests
url = 'https://api.buienradar.nl/data/forecast/1.1/all/3489854'
# Get json response
data = requests.get(url).json()
# Parse json response
for day in data['days']:
if 'hours' in day:
print(day['date'])
for hour in day['hours']:
print('Hour - {}.00 and Precipitation - {} mm'.format(hour['hour'], hour['precipationmm']))
# 2017-05-05T00:00:00
# Hour - 21.00 and Precipitation - 0.0 mm
# Hour - 22.00 and Precipitation - 0.0 mm
# Hour - 23.00 and Precipitation - 0.0 mm
# 2017-05-06T00:00:00
# Hour - 0.00 and Precipitation - 0.0 mm
# Hour - 1.00 and Precipitation - 0.0 mm
# Hour - 2.00 and Precipitation - 0.0 mm
嗨,我今天花了几个小时从这个网站抓取一些数据:http://www.buienradar.nl/weer/kingston/jm/3489854/5daagse 我尝试获取橙色框内的数据。weather data
我在 python 3 并使用 bs4
无论我尝试什么,我都只会得到类似键的结果,例如 {temperature} 我如何获得价值?
from bs4 import BeautifulSoup
import requests
url = "http://www.buienradar.nl/weer/kingston/jm/3489854/5daagse"
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
letters = soup.find_all("div", class_="forecast")
tempe = soup.find(class_='temperature').attrs
print(tempe)
table = soup.find(class_='precipitation').attrs
print(table)
heds = soup.find_all('table')
for h in heds:
m = h.find_all('td')
print(m)
for o in m:
print(o.text)
结果是:
{'class': ['temperature']}
{'class': ['precipitation']}
[<td>{time}</td>, <td><img data-url="/resources/images/icons/weather/30x30/{iconcode}.png" src=""/></td>, <td><span class="temperature">{temperature}°C</span></td>, <td>{feeltemperature}°C</td>, <td>{winddirection} {beaufort}</td>, <td style="text-align:left;"><img data-url="/resources/images/icons/wind/{winddirection}.png" src="" style="width:20px;"/></td>, <td class="precipitation">{precipation}%</td>, <td>{precipationmm} mm</td>, <td>{sunshine}%</td>]
{time}
{temperature}°C
{feeltemperature}°C
{winddirection} {beaufort}
{precipation}%
{precipationmm} mm
{sunshine}%
Process finished with exit code 0
我做错了什么?提前致谢。
编辑感谢我得到的答案 运行:
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
import requests
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from time import sleep
driver = webdriver.Firefox(executable_path=r'path/to/selenium')
url = "http://www.buienradar.nl/weer/kingston/jm/3489854/5daagse"
driver.get(url)
WebDriverWait(driver, 10).until( EC.visibility_of_element_located((By.CLASS_NAME, "forecast")))
print("access")
sleep(1)
html_page = driver.page_source
driver.quit()
soup = BeautifulSoup(html_page, "lxml")
letters = soup.find_all("div", class_="forecast")
tempe = soup.find(class_='temperature').attrs
print(tempe)
table = soup.find(class_='precipitation').attrs
print(table)
heds = soup.find_all('table')
for h in heds:
m = h.find_all('td')
print(m)
for o in m:
print(o.text)
您没有做错任何事情,您只是没有做浏览器所做的一切。特别是当您获取 URL 时,该网站仅提供 "template" 并且他们依赖 Javascript 来填充模板值。如果您在 Chrome 中打开 "Networking" 选项卡,您将看到一堆请求。具体来说,https://static.buienradar.nl/resources/js/v/1.0.22/buienradar.min.js 将执行一系列替换,包括 {temperature} 和 {feeltemperature}。
如果你正在寻找像温度这样的东西,你会这样做:
temp = soup.findAll('span',{'class':'temperature'})
#It's not spelled correctly make sure you take that into account
如果您直接向站点 api:
发出请求,则无需Selenium
即可更快地获得所需数据
import requests
url = 'https://api.buienradar.nl/data/forecast/1.1/all/3489854'
# Get json response
data = requests.get(url).json()
# Parse json response
for day in data['days']:
if 'hours' in day:
print(day['date'])
for hour in day['hours']:
print('Hour - {}.00 and Precipitation - {} mm'.format(hour['hour'], hour['precipationmm']))
# 2017-05-05T00:00:00
# Hour - 21.00 and Precipitation - 0.0 mm
# Hour - 22.00 and Precipitation - 0.0 mm
# Hour - 23.00 and Precipitation - 0.0 mm
# 2017-05-06T00:00:00
# Hour - 0.00 and Precipitation - 0.0 mm
# Hour - 1.00 and Precipitation - 0.0 mm
# Hour - 2.00 and Precipitation - 0.0 mm