在更改下拉列表中的选项时从 URL 不变的站点抓取数据

Scraping data from a site where the URL doesn't change while changing options in a drop-down list

我正在使用 BeautifulSoup 来抓取 table 安特卫普天气历史2017 年 4 月 1 日 在此 webpage。但我不仅需要这个日期,我还需要 2017 年 4 月的所有日期,这些日期在下拉列表中:

在检查器中,它是一个 select 标签,带有如下选项:

我可以用下一段代码得到它们的值:

prefix = 'https://www.timeanddate.com'
weather_request = requests.get(prefix + '/weather/belgium/antwerp/historic?month=4&year=2017', 
                       'html.parser')
weather = BeautifulSoup(weather_request.content)

for option in weather.select('select > option'):
     append_to_mylist(option.get('value'), option.text)

你能帮我吗,如何从这些值中删除 tables,因为 URL 在更改下拉列表中的选项时不会改变?

我发现了其他一些类似的问题,但不是关于 BeautifulSoup

数据是通过 Ajax 从其他 URL 加载的。返回的数据不是 Json,而是原始数据 Javascript,因此需要进行一些预处理才能正确解析它。

例如:

import re
import json
import requests
import pandas as pd
from bs4 import BeautifulSoup


for day in range(1, 31):
    print('Getting info for day {}..'.format(day))
    url = 'https://www.timeanddate.com/scripts/cityajax.php?n=belgium/antwerp&mode=historic&hd=201704{:02d}&month=4&year=2017&json=1'.format(day)

    data = requests.get(url).text
    data = json.loads(re.sub(r'(c|h|s):', r'"":', data))

    # uncomment this to print raw data:
    # print(json.dumps(data, indent=4))

    # construct the table from json:
    table = '<table>'
    for row in data:
        table += '<tr>'
        for cell in row['c']:
            table += '<td>' + BeautifulSoup(cell['h'], 'html.parser').get_text(strip=True, separator=' ') + '</td>'
        table += '</tr>'
    table += '</table>'

    # now in `table` is HTML table, you can parse it with BeautifulSoup, or pass it to Pandas:
    df = pd.read_html(table)[0]
    print(df)
    print('-' * 120)

打印:

Getting info for day 1..
                      0   1      2                            3      4  5     6          7      8
0   12:20 am Sat, Apr 1 NaN  50 °F                       Clear.  2 mph  ↑   94%  29.92 "Hg   2 mi
1              12:50 am NaN  46 °F                         Fog.  2 mph  ↑  100%  29.92 "Hg   2 mi
2               1:20 am NaN  48 °F                   Light fog.  3 mph  ↑   87%  29.89 "Hg   0 mi
3               1:50 am NaN  48 °F                       Clear.  3 mph  ↑   94%  29.89 "Hg   1 mi
4               2:20 am NaN  46 °F                         Fog.  5 mph  ↑  100%  29.89 "Hg   1 mi
5               3:20 am NaN  46 °F                       Clear.  3 mph  ↑   93%  29.89 "Hg   1 mi
6               3:50 am NaN  46 °F                         Fog.  6 mph  ↑   93%  29.86 "Hg   1 mi
7               4:20 am NaN  46 °F                         Fog.  3 mph  ↑  100%  29.86 "Hg   1 mi
8               4:50 am NaN  46 °F                         Fog.  3 mph  ↑  100%  29.86 "Hg   1 mi
9               5:20 am NaN  46 °F                         Fog.  2 mph  ↑   93%  29.86 "Hg   2 mi
10              5:50 am NaN  48 °F                       Clear.  3 mph  ↑   87%  29.86 "Hg   4 mi
11              6:20 am NaN  48 °F                       Clear.  5 mph  ↑   87%  29.83 "Hg   4 mi
12              6:50 am NaN  48 °F                       Clear.  5 mph  ↑   94%  29.86 "Hg   4 mi
13              7:20 am NaN  50 °F            Sprinkles. Clear.  6 mph  ↑   94%  29.86 "Hg   4 mi
14              7:50 am NaN  52 °F    Sprinkles. Broken clouds.  9 mph  ↑   88%  29.86 "Hg   3 mi
15              8:20 am NaN  52 °F    Light rain. Partly sunny.  8 mph  ↑   88%  29.86 "Hg   5 mi
16              8:50 am NaN  52 °F  Light rain. Passing clouds.  6 mph  ↑   94%  29.86 "Hg   5 mi
17              9:20 am NaN  52 °F       Drizzle. Partly sunny.  5 mph  ↑   94%  29.86 "Hg   5 mi
18              9:50 am NaN  52 °F               Broken clouds.  5 mph  ↑   94%  29.86 "Hg   5 mi
19             10:20 am NaN  52 °F               Broken clouds.  6 mph  ↑   94%  29.89 "Hg    NaN
20             10:50 am NaN  52 °F    Sprinkles. Broken clouds.  8 mph  ↑   94%  29.89 "Hg   5 mi
21             11:20 am NaN  52 °F                Partly sunny.  5 mph  ↑   94%  29.89 "Hg    NaN
22             11:50 am NaN  54 °F            Scattered clouds.  2 mph  ↑   88%  29.89 "Hg    NaN
23             12:20 pm NaN  55 °F            Scattered clouds.  5 mph  ↑   82%  29.89 "Hg    NaN
24             12:50 pm NaN  55 °F            Scattered clouds.  3 mph  ↑   77%  29.89 "Hg    NaN
25              1:20 pm NaN  57 °F              Passing clouds.  5 mph  ↑   72%  29.89 "Hg    NaN
26              1:50 pm NaN  57 °F              Passing clouds.  3 mph  ↑   67%  29.89 "Hg    NaN
27              2:20 pm NaN  57 °F              Passing clouds.  7 mph  ↑   72%  29.89 "Hg    NaN
28              2:50 pm NaN  57 °F            Scattered clouds.  3 mph  ↑   72%  29.89 "Hg    NaN
29              3:20 pm NaN  55 °F    Sprinkles. Broken clouds.  9 mph  ↑   77%  29.89 "Hg   4 mi
30              3:50 pm NaN  55 °F    Sprinkles. Broken clouds.  3 mph  ↑   77%  29.86 "Hg   5 mi
31              4:20 pm NaN  55 °F    Sprinkles. Broken clouds.  2 mph  ↑   82%  29.89 "Hg    NaN
32              4:50 pm NaN  57 °F            Scattered clouds.  2 mph  ↑   77%  29.86 "Hg    NaN
33              5:20 pm NaN  57 °F            Scattered clouds.  7 mph  ↑   72%  29.89 "Hg    NaN
34              5:50 pm NaN  55 °F            Scattered clouds.  6 mph  ↑   88%  29.89 "Hg    NaN
35              6:20 pm NaN  55 °F              Passing clouds.  6 mph  ↑   82%  29.89 "Hg    NaN
36              6:50 pm NaN  55 °F              Passing clouds.  3 mph  ↑   82%  29.89 "Hg    NaN
37              7:20 pm NaN  54 °F              Passing clouds.  5 mph  ↑   94%  29.89 "Hg    NaN
38              7:50 pm NaN  54 °F              Passing clouds.  5 mph  ↑   88%  29.89 "Hg    NaN
39              8:20 pm NaN  54 °F              Passing clouds.  7 mph  ↑   88%  29.92 "Hg    NaN
40              8:50 pm NaN  54 °F                       Clear.  7 mph  ↑   88%  29.92 "Hg  10 mi
41              9:20 pm NaN  54 °F                       Clear.  2 mph  ↑   88%  29.92 "Hg  10 mi
42              9:50 pm NaN  52 °F                       Clear.  5 mph  ↑   94%  29.92 "Hg  10 mi
43             10:20 pm NaN  48 °F                       Clear.  2 mph  ↑  100%  29.95 "Hg  10 mi
44             10:50 pm NaN  52 °F                       Clear.  3 mph  ↑   88%  29.95 "Hg   4 mi
45             11:20 pm NaN  46 °F                         Fog.  2 mph  ↑   93%  29.95 "Hg   1 mi
46             11:50 pm NaN  46 °F                       Clear.  3 mph  ↑   93%  29.95 "Hg   0 mi
------------------------------------------------------------------------------------------------------------------------
Getting info for day 2..
                      0   1      2                  3       4  5     6          7      8
0   12:20 am Sun, Apr 2 NaN  45 °F               Fog.   2 mph  ↑  100%  29.95 "Hg   0 mi
1              12:50 am NaN  45 °F               Fog.   2 mph  ↑   93%  29.98 "Hg   1 mi
2               1:20 am NaN  45 °F               Fog.   2 mph  ↑  100%  29.95 "Hg   0 mi
3               1:50 am NaN  45 °F             Clear.   3 mph  ↑   87%  29.98 "Hg   4 mi
4               2:20 am NaN  48 °F             Clear.   6 mph  ↑   87%  29.98 "Hg  10 mi
5               2:50 am NaN  48 °F             Clear.   2 mph  ↑   87%  29.98 "Hg  10 mi
6               3:20 am NaN  48 °F             Clear.   5 mph  ↑   87%  29.98 "Hg  10 mi
7               3:50 am NaN  48 °F             Clear.   2 mph  ↑   87%  29.98 "Hg   6 mi
8               4:50 am NaN  46 °F             Clear.   2 mph  ↑   87%  30.01 "Hg  10 mi
9               5:20 am NaN  46 °F    Passing clouds.   3 mph  ↑   87%  30.01 "Hg    NaN
10              5:50 am NaN  46 °F             Clear.   2 mph  ↑   87%  30.01 "Hg  10 mi
11              6:20 am NaN  46 °F             Clear.   1 mph  ↑   87%  30.04 "Hg   4 mi
12              6:50 am NaN  45 °F         Light fog.   2 mph  ↑   93%  30.04 "Hg   5 mi


... and so on.