如何使用 BeautifulSoup 库从可折叠 table 中抓取 tbody?

How to scrape tbody from a collapsible table using BeautifulSoup library?

最近我做了一个基于 covid-19 仪表板的项目。我用来从这个 website which has a collapsible table. Everything was ok till now, now recently the heroku app showing some errors. So i rerun my code in my local machine and the error occured at scraping tbody. Then i figured out that the site i use to scrape data has changed or updated the way it looks (table) and then my code is not able to grab it. I tried viewing page source and i am not able to find the table (tbody) that is on this page 抓取 数据的地方。但是如果我检查 table 的行但找不到,我能够找到 tbody 和所有数据它在 source.How 页上,我现在可以 抓取 table 吗? 我的代码: 我要抢的table:

您在页面上看到的数据是通过 Ajax 从外部 URL 加载的。您可以使用 requests/json 模块加载它:

import json
import requests


url = 'https://www.mohfw.gov.in/data/datanew.json'
data = requests.get(url).json()

# uncomment to print all data:
# print(json.dumps(data, indent=4))

# print some data on screen:
for d in data:
    print('{:<30} {:<10} {:<10} {:<10} {:<10}'.format(d['state_name'], d['active'], d['positive'], d['cured'], d['death']))

打印:

Andaman and Nicobar Islands    329        548        214        5         
Andhra Pradesh                 75720      140933     63864      1349      
Arunachal Pradesh              670        1591       918        3         
Assam                          9814       40269      30357      98        
Bihar                          17579      51233      33358      296       
Chandigarh                     369        1051       667        15        
Chhattisgarh                   2803       9086       6230       53        

... and so on.

尝试:

import json
import requests
import pandas as pd
data = []
row = []
r = requests.get('https://www.mohfw.gov.in/data/datanew.json')
j = json.loads(r.text)
for i in j:
    for k in i:
        row.append(i[k])
    data.append(row)
    row = []
columns = [i for i in j[0]]

df = pd.DataFrame(data, columns=columns)
df.sno = pd.to_numeric(df.sno, errors='coerce').reset_index()
df = df.sort_values('sno',)
print(df.to_string())

打印:

    sno                                state_name  active positive    cured  death new_active new_positive new_cured new_death state_code
0     0               Andaman and Nicobar Islands     329      548      214      5        403          636       226         7         35
1     1                            Andhra Pradesh   75720   140933    63864   1349      72188       150209     76614      1407         28
2     2                         Arunachal Pradesh     670     1591      918      3        701         1673       969         3         12
3     3                                     Assam    9814    40269    30357     98      10183        41726     31442       101         18
4     4                                     Bihar   17579    51233    33358    296      18937        54240     34994       309         10
5     5                                Chandigarh     369     1051      667     15        378         1079       683        18         04
6     6                              Chhattisgarh    2803     9086     6230     53       2720         9385      6610        55         22
7     7  Dadra and Nagar Haveli and Daman and Diu     412     1100      686      2        418         1145       725         2         26
8     8                                     Delhi   10705   135598   120930   3963      10596       136716    122131      3989         07
9     9                                       Goa    1657     5913     4211     45       1707         6193      4438        48         30
10   10                                   Gujarat   14090    61438    44907   2441      14300        62463     45699      2464         24

等等...