Yandex.Weather 正在解析
Yandex.Weather parsing
我正在尝试从 https://www.yandex.com/weather/moscow 下载 7 天的预报,问题是除了今天之外所有的日子都是一样的 class。我如何获得 7 天(或至少 9 天)的天气预报?
我正在试用 BeautifulSoap 库。我有今天的天气,但其他日子都是问题。
这是我的代码:
import urllib.request
from bs4 import BeautifulSoup
def get_html(url):
response = urllib.request.urlopen(url)
return response.read()
def parse_today(html):
soup = BeautifulSoup(html, "html.parser")
temp = soup.find('div', class_='temp fact__temp fact__temp_size_s').get_text().encode('utf-8').decode('utf-8', 'ignore')
return temp
def parse_next_day(day_num, html):
# ?????
pass
def main():
temp = parse_today(get_html('https://yandex.ru/weather/moscow'))
print("Now the temperature is: ", temp)
for i in range(1,6):
next_temp = parse_next_day(i+1, get_html('https://yandex.ru/weather/moscow'))
print("The day", i+1, "temperature is : ", next_temp)
if __name__ == '__main__':
main()
数据是从您可以在网络选项卡中找到的 url 动态提取的。它returnshtml。您可以使用 css 选择器 .card:not(.adv)
隔离用于预测的日期块。 使用 bs4 4.7.1 +。解析出来的例子感觉像临时工:
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.yandex.com/weather/segment/details?offset=0&lat=55.753215&lon=37.622504&geoid=213&limit=10', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
for card in soup.select('.card:not(.adv)'):
date = ' '.join([i.text for i in card.select('[class$=number],[class$=month]')])
print(date)
temps = list(zip(
[i.text for i in card.select('.weather-table__daypart')]
, [i.text for i in card.select('.weather-table__body-cell_type_feels-like .temp__value')]
))
print(temps)
我正在尝试从 https://www.yandex.com/weather/moscow 下载 7 天的预报,问题是除了今天之外所有的日子都是一样的 class。我如何获得 7 天(或至少 9 天)的天气预报?
我正在试用 BeautifulSoap 库。我有今天的天气,但其他日子都是问题。
这是我的代码:
import urllib.request
from bs4 import BeautifulSoup
def get_html(url):
response = urllib.request.urlopen(url)
return response.read()
def parse_today(html):
soup = BeautifulSoup(html, "html.parser")
temp = soup.find('div', class_='temp fact__temp fact__temp_size_s').get_text().encode('utf-8').decode('utf-8', 'ignore')
return temp
def parse_next_day(day_num, html):
# ?????
pass
def main():
temp = parse_today(get_html('https://yandex.ru/weather/moscow'))
print("Now the temperature is: ", temp)
for i in range(1,6):
next_temp = parse_next_day(i+1, get_html('https://yandex.ru/weather/moscow'))
print("The day", i+1, "temperature is : ", next_temp)
if __name__ == '__main__':
main()
数据是从您可以在网络选项卡中找到的 url 动态提取的。它returnshtml。您可以使用 css 选择器 .card:not(.adv)
隔离用于预测的日期块。 使用 bs4 4.7.1 +。解析出来的例子感觉像临时工:
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.yandex.com/weather/segment/details?offset=0&lat=55.753215&lon=37.622504&geoid=213&limit=10', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
for card in soup.select('.card:not(.adv)'):
date = ' '.join([i.text for i in card.select('[class$=number],[class$=month]')])
print(date)
temps = list(zip(
[i.text for i in card.select('.weather-table__daypart')]
, [i.text for i in card.select('.weather-table__body-cell_type_feels-like .temp__value')]
))
print(temps)