未能在 Python 中使用 Beautiful Soup 提取 html table 数据

Question

我正在尝试复制此 code 并制作一些图表，但我未能获得 csv 文件。我运行完全相同的代码但无济于事，因为它打印空数据帧。

代码：

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup
import geopandas as gpd
from prettytable import PrettyTable

url = 'https://www.mohfw.gov.in/'
# make a GET request to fetch the raw HTML content
web_content = requests.get(url).content

# parse the html content
soup = BeautifulSoup(web_content, "html.parser")

# remove any newlines and extra spaces from left and right
extract_contents = lambda row: [x.text.replace('\n', '') for x in row]

# find all table rows and data cells within
stats = [] 
all_rows = soup.find_all('tr')
for row in all_rows:
    stat = extract_contents(row.find_all('td')) 
# notice that the data that we require is now a list of length 5
    if len(stat) == 5:
        stats.append(stat)

#now convert the data into a pandas dataframe for further processing
new_cols = ["Sr.No", "States/UT","Confirmed","Recovered","Deceased"]
state_data = pd.DataFrame(data = stats, columns = new_cols)
state_data.head()

感谢任何帮助。

Answer 1

您可以从允许 return JSON 的 URI 获取所有数据。您将需要映射一些列名称，然后使用 returned 列进行计算以得出自昨天以来的变化。 new_ 列是今天的值。

import pandas as pd
import requests

r = requests.get('https://www.mohfw.gov.in/data/datanew.json').json()
df = pd.DataFrame(r)
df

未能在 Python 中使用 Beautiful Soup 提取 html table 数据

Failed to extract html table data using Beautiful Soup in Python

python

beautifulsoup

html-parsing

pandas