我在使用 beautiful soup 和 python 进行网络抓取时没有得到任何输出
I am not getting any output in web scraping using beautiful soup and python and
我正在使用 python、beautiful soup 和 lxml 从站点中提取一些信息,但我没有得到任何输出和任何错误。这是代码
from bs4 import BeautifulSoup
import requests
html_file = requests.get('https://www.covid19india.org').text
soup = BeautifulSoup(html_file)
states = soup.find_all('div',class_='state-name fadeInUp')
print(states)
您遇到此问题是因为您从该网站获得的响应与您从网站收到的响应不同。
暗示:
你可以用它在你的输出终端上显示它
from IPython.core.display import display,HTML
display(HTML(html_file))
数据是通过 JavaScript 从外部源加载的,因此 BeautifulSoup 看不到它。您可以使用此示例如何通过 requests
模块加载数据:
import json
import requests
url = "https://api.covid19india.org/v4/min/data.min.json"
data = requests.get(url).json()
# uncomment to print all data:
# print(json.dumps(data, indent=4))
# print some data
print(
"{:<3} {:<10} {:<10} {:<10} {:<10} {:<10}".format(
"STATE", "CONFIRMED", "DECEASED", "RECOVERED", "TESTED", "VACCINATED"
)
)
for k, v in data.items():
print(
"{:<5} {:<10} {:<10} {:<10} {:<10} {:<10}".format(
k,
v["total"]["confirmed"],
v["total"]["deceased"],
v["total"]["recovered"],
v["total"]["tested"],
v["total"]["vaccinated"],
)
)
打印:
STATE CONFIRMED DECEASED RECOVERED TESTED VACCINATED
AN 6789 98 6417 381349 117175
AP 1542079 9904 1323019 18435149 7857020
AR 23159 89 20339 525589 318092
AS 359640 2588 302889 9982316 3620118
BR 681199 4339 627548 28735144 9363116
CH 57737 680 51382 477484 302008
CT 941366 12391 852529 8508024 6812740
DL 1412959 22831 1354445 18595993 4970999
DN 9822 4 9167 72410 133590
GA 143192 2302 121562 778828 466488
GJ 780471 9469 686581 20763702 15176798
HP 175384 2638 141198 1806900 2303829
HR 728607 7317 666893 8585262 5198191
JH 324884 4714 293659 7998874 3790724
JK 263905 3465 210547 8140036 2905270
KA 2367742 24207 1829276 28453442 11799162
KL 2293633 6995 1979919 18555023 8597282
LA 17025 172 15264 237486 128352
LD 5756 20 4051 108986 29058
MH 5527092 86618 5070801 32441776 20465193
ML 27755 414 20480 527955 424005
MN 42565 661 35606 675645 403249
MP 757119 7394 682100 9148503 9533515
MZ 9732 30 7450 354526 301219
NL 19593 258 14149 207977 246656
OR 668422 2483 567382 11118984 6964278
PB 528676 12888 452318 8556117 4519318
PY 93167 1295 73936 971544 239864
RJ 903418 7475 764137 10118617 15708235
SK 12527 220 8916 108170 230968
TG 547727 3085 500247 14402251 5522361
TN 1770988 19598 1476761 25949517 7280700
TR 44356 452 37037 836629 1527207
TT 26285069 295508 23059017 324417870 191879503
UP 1659212 18760 1534176 46111719 15813654
UT 307566 5600 233266 4443937 2743665
WB 1229805 14054 1083570 11786397 12932811
我正在使用 python、beautiful soup 和 lxml 从站点中提取一些信息,但我没有得到任何输出和任何错误。这是代码
from bs4 import BeautifulSoup
import requests
html_file = requests.get('https://www.covid19india.org').text
soup = BeautifulSoup(html_file)
states = soup.find_all('div',class_='state-name fadeInUp')
print(states)
您遇到此问题是因为您从该网站获得的响应与您从网站收到的响应不同。
暗示:
你可以用它在你的输出终端上显示它
from IPython.core.display import display,HTML
display(HTML(html_file))
数据是通过 JavaScript 从外部源加载的,因此 BeautifulSoup 看不到它。您可以使用此示例如何通过 requests
模块加载数据:
import json
import requests
url = "https://api.covid19india.org/v4/min/data.min.json"
data = requests.get(url).json()
# uncomment to print all data:
# print(json.dumps(data, indent=4))
# print some data
print(
"{:<3} {:<10} {:<10} {:<10} {:<10} {:<10}".format(
"STATE", "CONFIRMED", "DECEASED", "RECOVERED", "TESTED", "VACCINATED"
)
)
for k, v in data.items():
print(
"{:<5} {:<10} {:<10} {:<10} {:<10} {:<10}".format(
k,
v["total"]["confirmed"],
v["total"]["deceased"],
v["total"]["recovered"],
v["total"]["tested"],
v["total"]["vaccinated"],
)
)
打印:
STATE CONFIRMED DECEASED RECOVERED TESTED VACCINATED
AN 6789 98 6417 381349 117175
AP 1542079 9904 1323019 18435149 7857020
AR 23159 89 20339 525589 318092
AS 359640 2588 302889 9982316 3620118
BR 681199 4339 627548 28735144 9363116
CH 57737 680 51382 477484 302008
CT 941366 12391 852529 8508024 6812740
DL 1412959 22831 1354445 18595993 4970999
DN 9822 4 9167 72410 133590
GA 143192 2302 121562 778828 466488
GJ 780471 9469 686581 20763702 15176798
HP 175384 2638 141198 1806900 2303829
HR 728607 7317 666893 8585262 5198191
JH 324884 4714 293659 7998874 3790724
JK 263905 3465 210547 8140036 2905270
KA 2367742 24207 1829276 28453442 11799162
KL 2293633 6995 1979919 18555023 8597282
LA 17025 172 15264 237486 128352
LD 5756 20 4051 108986 29058
MH 5527092 86618 5070801 32441776 20465193
ML 27755 414 20480 527955 424005
MN 42565 661 35606 675645 403249
MP 757119 7394 682100 9148503 9533515
MZ 9732 30 7450 354526 301219
NL 19593 258 14149 207977 246656
OR 668422 2483 567382 11118984 6964278
PB 528676 12888 452318 8556117 4519318
PY 93167 1295 73936 971544 239864
RJ 903418 7475 764137 10118617 15708235
SK 12527 220 8916 108170 230968
TG 547727 3085 500247 14402251 5522361
TN 1770988 19598 1476761 25949517 7280700
TR 44356 452 37037 836629 1527207
TT 26285069 295508 23059017 324417870 191879503
UP 1659212 18760 1534176 46111719 15813654
UT 307566 5600 233266 4443937 2743665
WB 1229805 14054 1083570 11786397 12932811