BeautifulSoup 如何从站点 (corona) 提取数据？

Question

我想以国家名称的形式保存每个国家的文章数量，文件中的文章数量用于我从以下站点进行的研究工作。为此，我编写了这段代码，不幸的是，它不起作用。

http://corona.sid.ir/

!pip install bs4
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page
url='http://corona.sid.ir/'
data  = requests.get(url).text 
soup = BeautifulSoup(data,"lxml")  # create a soup object using the variable 'data'
soup.find_all(attrs={"class":"value"})

结果= []

Answer 1

你用错了url。试试这个：

from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page
import pandas as pd

url = 'http://corona.sid.ir/world.svg'
data  = requests.get(url).text 
soup = BeautifulSoup(data,"lxml")  # create a soup object using the variable 'data'
soup.find_all(attrs={"class":"value"})

rows = []
for each in soup.find_all(attrs={"class":"value"}):
    row = {}
    row['country'] = each.text.split(':')[0]
    row['count'] = each.text.split(':')[1].strip()
    rows.append(row)
    
df = pd.DataFrame(rows)

输出：

print(df)
                  country count
0                 Andorra    17
1    United Arab Emirates   987
2             Afghanistan    67
3                 Albania   143
4                 Armenia    49
..                    ...   ...
179                 Yemen    54
180               Mayotte     0
181          South Africa  1938
182                Zambia   127
183              Zimbabwe   120

[184 rows x 2 columns]

BeautifulSoup 如何从站点 (corona) 提取数据？

How extract data from the site (corona) by BeautifulSoup?

python

beautifulsoup

web-scraping

data-extraction