我的代码错误地从 URL 和 Python 下载 CSV 文件
My code wrongfully downloads a CSV file from an URL with Python
我创建了一些代码来从 URL 下载 CSV 文件。该代码下载了 link 的 HTML 代码,但是当我复制我在浏览器中创建的 url 时它可以工作,但它不在代码中。
我尝试了 os、response 和 urllib,但所有这些选项都提供了相同的结果。
这是我最终想下载为 CSV 的 link:
https://www.ishares.com/uk/individual/en/products/251567/ishares-asia-pacific-dividend-ucits-etf/1506575576011.ajax?fileType=csv&fileName=IAPD_holdings&dataType=fund
import requests
#this is the url where the csv is
url='https://www.ishares.com/uk/individual/en/products/251567/ishares-asia-pacific-dividend-ucits-etf?switchLocale=y&siteEntryPassthrough=true'
r = requests.get(url, allow_redirects=True)
response = requests.get(url)
if response.status_code == 200:
print("Success")
else:
print("Failure")
#find the url for the CSV
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.content,'lxml')
for i in soup.find_all('a',{'class':"icon-xls-export"}):
print(i.get('href'))
# I get two types of files, one CSV and the other xls.
link_list=[]
for i in soup.find_all('a', {'class':"icon-xls-export"}):
link_list.append(i.get('href'))
# I create the link with the CSV
url_csv = "https://www.ishares.com//"+link_list[0]
response_csv = requests.get(url_csv)
if response_csv.status_code == 200:
print("Success")
else:
print("Failure")
#Here I want to download the file
import urllib.request
with urllib.request.urlopen(url_csv) as holdings1, open('dataset.csv', 'w') as f:
f.write(holdings1.read().decode())
我想要下载 CSV 数据。
它需要 cookie 才能正常工作
我使用 requests.Session()
自动获取和保存 cookie。
我在文件 response_csv.content
中写入,因为在第二次请求后我已经有了它 - 所以我不必再提出请求。而且因为使用 urllib.request
我将创建没有 cookie 的请求并且它可能无法工作。
import requests
from bs4 import BeautifulSoup
s = requests.Session()
url='https://www.ishares.com/uk/individual/en/products/251567/ishares-asia-pacific-dividend-ucits-etf?switchLocale=y&siteEntryPassthrough=true'
response = s.get(url, allow_redirects=True)
if response.status_code == 200:
print("Success")
else:
print("Failure")
#find the url for the CSV
soup = BeautifulSoup(response.content,'lxml')
for i in soup.find_all('a',{'class':"icon-xls-export"}):
print(i.get('href'))
# I get two types of files, one CSV and the other xls.
link_list=[]
for i in soup.find_all('a', {'class':"icon-xls-export"}):
link_list.append(i.get('href'))
# I create the link with the CSV
url_csv = "https://www.ishares.com//"+link_list[0]
response_csv = s.get(url_csv)
if response_csv.status_code == 200:
print("Success")
f = open('dataset.csv', 'wb')
f.write(response_csv.content)
f.close()
else:
print("Failure")
我创建了一些代码来从 URL 下载 CSV 文件。该代码下载了 link 的 HTML 代码,但是当我复制我在浏览器中创建的 url 时它可以工作,但它不在代码中。
我尝试了 os、response 和 urllib,但所有这些选项都提供了相同的结果。
这是我最终想下载为 CSV 的 link: https://www.ishares.com/uk/individual/en/products/251567/ishares-asia-pacific-dividend-ucits-etf/1506575576011.ajax?fileType=csv&fileName=IAPD_holdings&dataType=fund
import requests
#this is the url where the csv is
url='https://www.ishares.com/uk/individual/en/products/251567/ishares-asia-pacific-dividend-ucits-etf?switchLocale=y&siteEntryPassthrough=true'
r = requests.get(url, allow_redirects=True)
response = requests.get(url)
if response.status_code == 200:
print("Success")
else:
print("Failure")
#find the url for the CSV
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.content,'lxml')
for i in soup.find_all('a',{'class':"icon-xls-export"}):
print(i.get('href'))
# I get two types of files, one CSV and the other xls.
link_list=[]
for i in soup.find_all('a', {'class':"icon-xls-export"}):
link_list.append(i.get('href'))
# I create the link with the CSV
url_csv = "https://www.ishares.com//"+link_list[0]
response_csv = requests.get(url_csv)
if response_csv.status_code == 200:
print("Success")
else:
print("Failure")
#Here I want to download the file
import urllib.request
with urllib.request.urlopen(url_csv) as holdings1, open('dataset.csv', 'w') as f:
f.write(holdings1.read().decode())
我想要下载 CSV 数据。
它需要 cookie 才能正常工作
我使用 requests.Session()
自动获取和保存 cookie。
我在文件 response_csv.content
中写入,因为在第二次请求后我已经有了它 - 所以我不必再提出请求。而且因为使用 urllib.request
我将创建没有 cookie 的请求并且它可能无法工作。
import requests
from bs4 import BeautifulSoup
s = requests.Session()
url='https://www.ishares.com/uk/individual/en/products/251567/ishares-asia-pacific-dividend-ucits-etf?switchLocale=y&siteEntryPassthrough=true'
response = s.get(url, allow_redirects=True)
if response.status_code == 200:
print("Success")
else:
print("Failure")
#find the url for the CSV
soup = BeautifulSoup(response.content,'lxml')
for i in soup.find_all('a',{'class':"icon-xls-export"}):
print(i.get('href'))
# I get two types of files, one CSV and the other xls.
link_list=[]
for i in soup.find_all('a', {'class':"icon-xls-export"}):
link_list.append(i.get('href'))
# I create the link with the CSV
url_csv = "https://www.ishares.com//"+link_list[0]
response_csv = s.get(url_csv)
if response_csv.status_code == 200:
print("Success")
f = open('dataset.csv', 'wb')
f.write(response_csv.content)
f.close()
else:
print("Failure")