我如何使用 pandas 0.17.1 和 python 3.5 从互联网上下载压缩文件
How can i download a zipped file from the internet using pandas 0.17.1 and python 3.5
我做错了什么?这是我正在尝试做的事情:
import pandas as pd
url='http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip'
df = pd.read_csv(url, compression='gzip',
header=0, sep=',', quotechar='"',
engine = 'python')
IIUC这里是一个解决方案,而不是直接将zip文件传递给pandas
,先解压然后传递csv
文件:
from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen
import pandas as pd
url = urlopen("http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip")
zipfile = ZipFile(StringIO(url.read()))
f = open(zipfile.NameToInfo.keys()[0],'wb')
f.write(zipfile.open(zipfile.NameToInfo.keys()[0]).read())
f.close()
df = pd.read_csv(zipfile.NameToInfo.keys()[0])
并且会产生这样的 DataFrame
:
@Abbas,非常感谢。事实上,我 运行 它是一步一步来的,这就是我想出的。确实不是最快的,但它工作正常。
我 运行 pandas 0.18.1 python 3.5.1 Mac
from zipfile import ZipFile
from urllib.request import urlopen
import pandas as pd
import os
URL = \
'http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip'
# open and save the zip file onto computer
url = urlopen(URL)
output = open('zipFile.zip', 'wb') # note the flag: "wb"
output.write(url.read())
output.close()
# read the zip file as a pandas dataframe
df = pd.read_csv('zipFile.zip') # pandas version 0.18.1 takes zip files
# if keeping on disk the zip file is not wanted, then:
os.remove(zipName) # remove the copy of the zipfile on disk
希望对您有所帮助。谢谢!
Cy Bu 在 Windows 上的 Python 3.6 中对我的回答不太适用。尝试打开文件时收到无效参数错误。我稍微修改了一下:
import os
from urllib.request import urlopen, Request
r = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
b2 = [z for z in url.split('/') if '.zip' in z][0] #gets just the '.zip' part of the url
with open(b2, "wb") as target:
target.write(urlopen(r).read()) #saves to file to disk
data = pd.read_csv(b2, compression='zip') #opens the saved zip file
os.remove(b2) #removes the zip file
我做错了什么?这是我正在尝试做的事情:
import pandas as pd
url='http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip'
df = pd.read_csv(url, compression='gzip',
header=0, sep=',', quotechar='"',
engine = 'python')
IIUC这里是一个解决方案,而不是直接将zip文件传递给pandas
,先解压然后传递csv
文件:
from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen
import pandas as pd
url = urlopen("http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip")
zipfile = ZipFile(StringIO(url.read()))
f = open(zipfile.NameToInfo.keys()[0],'wb')
f.write(zipfile.open(zipfile.NameToInfo.keys()[0]).read())
f.close()
df = pd.read_csv(zipfile.NameToInfo.keys()[0])
并且会产生这样的 DataFrame
:
@Abbas,非常感谢。事实上,我 运行 它是一步一步来的,这就是我想出的。确实不是最快的,但它工作正常。
我 运行 pandas 0.18.1 python 3.5.1 Mac
from zipfile import ZipFile
from urllib.request import urlopen
import pandas as pd
import os
URL = \
'http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip'
# open and save the zip file onto computer
url = urlopen(URL)
output = open('zipFile.zip', 'wb') # note the flag: "wb"
output.write(url.read())
output.close()
# read the zip file as a pandas dataframe
df = pd.read_csv('zipFile.zip') # pandas version 0.18.1 takes zip files
# if keeping on disk the zip file is not wanted, then:
os.remove(zipName) # remove the copy of the zipfile on disk
希望对您有所帮助。谢谢!
Cy Bu 在 Windows 上的 Python 3.6 中对我的回答不太适用。尝试打开文件时收到无效参数错误。我稍微修改了一下:
import os
from urllib.request import urlopen, Request
r = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
b2 = [z for z in url.split('/') if '.zip' in z][0] #gets just the '.zip' part of the url
with open(b2, "wb") as target:
target.write(urlopen(r).read()) #saves to file to disk
data = pd.read_csv(b2, compression='zip') #opens the saved zip file
os.remove(b2) #removes the zip file