BeautifulSoup 如何从站点 (corona) 提取数据?
How extract data from the site (corona) by BeautifulSoup?
我想以国家名称的形式保存每个国家的文章数量,文件中的文章数量用于我从以下站点进行的研究工作。为此,我编写了这段代码,不幸的是,它不起作用。
!pip install bs4
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests # this module helps us to download a web page
url='http://corona.sid.ir/'
data = requests.get(url).text
soup = BeautifulSoup(data,"lxml") # create a soup object using the variable 'data'
soup.find_all(attrs={"class":"value"})
结果=
[]
你用错了url。试试这个:
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests # this module helps us to download a web page
import pandas as pd
url = 'http://corona.sid.ir/world.svg'
data = requests.get(url).text
soup = BeautifulSoup(data,"lxml") # create a soup object using the variable 'data'
soup.find_all(attrs={"class":"value"})
rows = []
for each in soup.find_all(attrs={"class":"value"}):
row = {}
row['country'] = each.text.split(':')[0]
row['count'] = each.text.split(':')[1].strip()
rows.append(row)
df = pd.DataFrame(rows)
输出:
print(df)
country count
0 Andorra 17
1 United Arab Emirates 987
2 Afghanistan 67
3 Albania 143
4 Armenia 49
.. ... ...
179 Yemen 54
180 Mayotte 0
181 South Africa 1938
182 Zambia 127
183 Zimbabwe 120
[184 rows x 2 columns]
我想以国家名称的形式保存每个国家的文章数量,文件中的文章数量用于我从以下站点进行的研究工作。为此,我编写了这段代码,不幸的是,它不起作用。
!pip install bs4
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests # this module helps us to download a web page
url='http://corona.sid.ir/'
data = requests.get(url).text
soup = BeautifulSoup(data,"lxml") # create a soup object using the variable 'data'
soup.find_all(attrs={"class":"value"})
结果= []
你用错了url。试试这个:
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests # this module helps us to download a web page
import pandas as pd
url = 'http://corona.sid.ir/world.svg'
data = requests.get(url).text
soup = BeautifulSoup(data,"lxml") # create a soup object using the variable 'data'
soup.find_all(attrs={"class":"value"})
rows = []
for each in soup.find_all(attrs={"class":"value"}):
row = {}
row['country'] = each.text.split(':')[0]
row['count'] = each.text.split(':')[1].strip()
rows.append(row)
df = pd.DataFrame(rows)
输出:
print(df)
country count
0 Andorra 17
1 United Arab Emirates 987
2 Afghanistan 67
3 Albania 143
4 Armenia 49
.. ... ...
179 Yemen 54
180 Mayotte 0
181 South Africa 1938
182 Zambia 127
183 Zimbabwe 120
[184 rows x 2 columns]