BeautifulSoup 找不到我的 table 因为一些奇怪的字符串
BeautifulSoup can't find my table because of some weird string
您好,我一直在尝试使用 BeautifulSoup 从 wunderground 获取 table,但它不起作用。
我认为它可能是 table header 旁边的 starnge 字符串,但我无法修复它。
这是我的代码:
from bs4 import BeautifulSoup
import requests
url='https://www.wunderground.com/history/daily/LEMD/date/2020-10-21'
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "html.parser")
table = soup.find("table", {"class": "mat-table cdk-table mat-sort ng-star-inserted"})
table_data = table.tbody.find_all("tr")
和错误:
Traceback (most recent call last):
File "weather_poc.py", line 12, in <module>
table_data = table.tbody.find_all("tr")
AttributeError: 'NoneType' object has no attribute 'tbody'
您看到的数据是通过 JavaScript 从外部 URL 加载的。您可以使用 requests
/json
模块加载它。例如:
import json
import requests
import pandas as pd
url = 'https://api.weather.com/v1/location/LEMD:9:ES/observations/historical.json?apiKey=6532d6454b8aa370768e63d6ba5a832e&units=e&startDate=20201021&endDate=20201021'
data = requests.get(url).json()
# uncomment this line to print all data:
# print(json.dumps(data, indent=4))
df = pd.json_normalize(data['observations'])
df.to_csv('data.csv', index=False)
创建 data.csv
(来自 LibreOffice 的屏幕截图):
您好,我一直在尝试使用 BeautifulSoup 从 wunderground 获取 table,但它不起作用。
我认为它可能是 table header 旁边的 starnge 字符串,但我无法修复它。
这是我的代码:
from bs4 import BeautifulSoup
import requests
url='https://www.wunderground.com/history/daily/LEMD/date/2020-10-21'
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "html.parser")
table = soup.find("table", {"class": "mat-table cdk-table mat-sort ng-star-inserted"})
table_data = table.tbody.find_all("tr")
和错误:
Traceback (most recent call last):
File "weather_poc.py", line 12, in <module>
table_data = table.tbody.find_all("tr")
AttributeError: 'NoneType' object has no attribute 'tbody'
您看到的数据是通过 JavaScript 从外部 URL 加载的。您可以使用 requests
/json
模块加载它。例如:
import json
import requests
import pandas as pd
url = 'https://api.weather.com/v1/location/LEMD:9:ES/observations/historical.json?apiKey=6532d6454b8aa370768e63d6ba5a832e&units=e&startDate=20201021&endDate=20201021'
data = requests.get(url).json()
# uncomment this line to print all data:
# print(json.dumps(data, indent=4))
df = pd.json_normalize(data['observations'])
df.to_csv('data.csv', index=False)
创建 data.csv
(来自 LibreOffice 的屏幕截图):