未能在 Python 中使用 Beautiful Soup 提取 html table 数据
Failed to extract html table data using Beautiful Soup in Python
我正在尝试复制此 code 并制作一些图表,但我未能获得 csv 文件。我 运行 完全相同的代码但无济于事,因为它打印空数据帧。
代码:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup
import geopandas as gpd
from prettytable import PrettyTable
url = 'https://www.mohfw.gov.in/'
# make a GET request to fetch the raw HTML content
web_content = requests.get(url).content
# parse the html content
soup = BeautifulSoup(web_content, "html.parser")
# remove any newlines and extra spaces from left and right
extract_contents = lambda row: [x.text.replace('\n', '') for x in row]
# find all table rows and data cells within
stats = []
all_rows = soup.find_all('tr')
for row in all_rows:
stat = extract_contents(row.find_all('td'))
# notice that the data that we require is now a list of length 5
if len(stat) == 5:
stats.append(stat)
#now convert the data into a pandas dataframe for further processing
new_cols = ["Sr.No", "States/UT","Confirmed","Recovered","Deceased"]
state_data = pd.DataFrame(data = stats, columns = new_cols)
state_data.head()
感谢任何帮助。
您可以从允许 return JSON 的 URI 获取所有数据。您将需要映射一些列名称,然后使用 returned 列进行计算以得出自昨天以来的变化。 new_
列是今天的值。
import pandas as pd
import requests
r = requests.get('https://www.mohfw.gov.in/data/datanew.json').json()
df = pd.DataFrame(r)
df
我正在尝试复制此 code 并制作一些图表,但我未能获得 csv 文件。我 运行 完全相同的代码但无济于事,因为它打印空数据帧。
代码:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup
import geopandas as gpd
from prettytable import PrettyTable
url = 'https://www.mohfw.gov.in/'
# make a GET request to fetch the raw HTML content
web_content = requests.get(url).content
# parse the html content
soup = BeautifulSoup(web_content, "html.parser")
# remove any newlines and extra spaces from left and right
extract_contents = lambda row: [x.text.replace('\n', '') for x in row]
# find all table rows and data cells within
stats = []
all_rows = soup.find_all('tr')
for row in all_rows:
stat = extract_contents(row.find_all('td'))
# notice that the data that we require is now a list of length 5
if len(stat) == 5:
stats.append(stat)
#now convert the data into a pandas dataframe for further processing
new_cols = ["Sr.No", "States/UT","Confirmed","Recovered","Deceased"]
state_data = pd.DataFrame(data = stats, columns = new_cols)
state_data.head()
感谢任何帮助。
您可以从允许 return JSON 的 URI 获取所有数据。您将需要映射一些列名称,然后使用 returned 列进行计算以得出自昨天以来的变化。 new_
列是今天的值。
import pandas as pd
import requests
r = requests.get('https://www.mohfw.gov.in/data/datanew.json').json()
df = pd.DataFrame(r)
df