无法使用 urllib 或请求下载文件
Unable to download files using urllib or requests
我正在尝试从 http://hdr.undp.org/en/indicators/137506# 下载文件。
但是,urllib 给我一个 403 禁止错误。
下面的代码只给我 HTTP 响应,而不是 csv 文件。有人可以帮忙吗
import requests
# Define the remote file to retrieve
remote_url = 'http://hdr.undp.org/en/indicators/137506'
# Define the local filename to save data
local_file = 'local_copy.csv'
# Make http request for remote file data
data = requests.get(remote_url)
# Save file data to local copy
with open(local_file, 'wb')as file:
file.write(data.content)
这里是页面中抓取的数据的不同URL,你需要什么就拿什么
import requests
import pandas as pd
urls = [
'http://hdr.undp.org/sites/all/themes/hdr_theme/js/bars.json',
'http://hdr.undp.org/sites/all/themes/hdr_theme/js/footnotes.json',
'http://hdr.undp.org/sites/all/themes/hdr_theme/js/rankiso.json',
'http://hdr.undp.org/sites/all/themes/hdr_theme/js/aggregates.json',
'http://hdr.undp.org/sites/all/themes/hdr_theme/js/summary.json'
]
for remote_url in urls:
data = requests.get(remote_url)
print(remote_url)
df = pd.DataFrame(data.json())
print(df.head(3).to_markdown())
# df.to_csv("out.csv")
http://hdr.undp.org/sites/all/themes/hdr_theme/js/bars.json
indicator
iso3
country
year
interval
value
id
0
Refugees by country of origin (thousands)
AFG
Afghanistan
2019
2721.47
21806
1
Population with at least some secondary education (% ages 25 and older)
AFG
Afghanistan
1990
8.2
23806
2
Population with at least some secondary education (% ages 25 and older)
AFG
Afghanistan
1995
9.8
23806
http://hdr.undp.org/sites/all/themes/hdr_theme/js/footnotes.json
footnote_id
indicator_id
iso3
year
0
kkk
21806
PSE
2019
1
uuuu
21806
PSE
2019
2
uuuu
21806
NPL
2019
http://hdr.undp.org/sites/all/themes/hdr_theme/js/rankiso.json
iso3
rank
0
AFG
169
1
AGO
148
2
ALB
69
http://hdr.undp.org/sites/all/themes/hdr_theme/js/aggregates.json
indicator_id
country_or_hierarchy_id
aggregation_type
aggregation_point_name
aggregation_id
sort_order
year
value
order
0
21806
302
Human Development
Very high human development
1
1
2018
1
1
21806
202
Human Development
High human development
1
2
2018
1
2
21806
102
Human Development
Medium human development
1
3
2018
1
http://hdr.undp.org/sites/all/themes/hdr_theme/js/summary.json
id
dimension
indicator
source
definition
sdg
0
21806
Human security
Refugees by country of origin (thousands)
UNHCR ....
Number of people ...
... situations
1
23806
Education
Population with at least some secondary education (% ages 25 and older)
UNESCO ..
Percentage ...
...
2
23906
Education
Population with at least some secondary education, female (% ages 25 and older)
UNESCO ...
Percentage of ..
SDG 4.4 ..
我正在尝试从 http://hdr.undp.org/en/indicators/137506# 下载文件。
但是,urllib 给我一个 403 禁止错误。
下面的代码只给我 HTTP 响应,而不是 csv 文件。有人可以帮忙吗
import requests
# Define the remote file to retrieve
remote_url = 'http://hdr.undp.org/en/indicators/137506'
# Define the local filename to save data
local_file = 'local_copy.csv'
# Make http request for remote file data
data = requests.get(remote_url)
# Save file data to local copy
with open(local_file, 'wb')as file:
file.write(data.content)
这里是页面中抓取的数据的不同URL,你需要什么就拿什么
import requests
import pandas as pd
urls = [
'http://hdr.undp.org/sites/all/themes/hdr_theme/js/bars.json',
'http://hdr.undp.org/sites/all/themes/hdr_theme/js/footnotes.json',
'http://hdr.undp.org/sites/all/themes/hdr_theme/js/rankiso.json',
'http://hdr.undp.org/sites/all/themes/hdr_theme/js/aggregates.json',
'http://hdr.undp.org/sites/all/themes/hdr_theme/js/summary.json'
]
for remote_url in urls:
data = requests.get(remote_url)
print(remote_url)
df = pd.DataFrame(data.json())
print(df.head(3).to_markdown())
# df.to_csv("out.csv")
http://hdr.undp.org/sites/all/themes/hdr_theme/js/bars.json
indicator | iso3 | country | year | interval | value | id | |
---|---|---|---|---|---|---|---|
0 | Refugees by country of origin (thousands) | AFG | Afghanistan | 2019 | 2721.47 | 21806 | |
1 | Population with at least some secondary education (% ages 25 and older) | AFG | Afghanistan | 1990 | 8.2 | 23806 | |
2 | Population with at least some secondary education (% ages 25 and older) | AFG | Afghanistan | 1995 | 9.8 | 23806 |
http://hdr.undp.org/sites/all/themes/hdr_theme/js/footnotes.json
footnote_id | indicator_id | iso3 | year | |
---|---|---|---|---|
0 | kkk | 21806 | PSE | 2019 |
1 | uuuu | 21806 | PSE | 2019 |
2 | uuuu | 21806 | NPL | 2019 |
http://hdr.undp.org/sites/all/themes/hdr_theme/js/rankiso.json
iso3 | rank | |
---|---|---|
0 | AFG | 169 |
1 | AGO | 148 |
2 | ALB | 69 |
http://hdr.undp.org/sites/all/themes/hdr_theme/js/aggregates.json
indicator_id | country_or_hierarchy_id | aggregation_type | aggregation_point_name | aggregation_id | sort_order | year | value | order | |
---|---|---|---|---|---|---|---|---|---|
0 | 21806 | 302 | Human Development | Very high human development | 1 | 1 | 2018 | 1 | |
1 | 21806 | 202 | Human Development | High human development | 1 | 2 | 2018 | 1 | |
2 | 21806 | 102 | Human Development | Medium human development | 1 | 3 | 2018 | 1 |
http://hdr.undp.org/sites/all/themes/hdr_theme/js/summary.json
id | dimension | indicator | source | definition | sdg | |
---|---|---|---|---|---|---|
0 | 21806 | Human security | Refugees by country of origin (thousands) | UNHCR .... | Number of people ... | ... situations |
1 | 23806 | Education | Population with at least some secondary education (% ages 25 and older) | UNESCO .. | Percentage ... | ... |
2 | 23906 | Education | Population with at least some secondary education, female (% ages 25 and older) | UNESCO ... | Percentage of .. | SDG 4.4 .. |