正在检查 Python 中的文件校验和
Checking file checksum in Python
我必须在 Python 中写入执行以下任务:
1- 从 url ‘http://files.grouplens.org/datasets/movielens/ml 下载 Movielens 数据集-
25m.zip'
2- 从 url ‘http://files.grouplens.org/datasets/movielens/ml 下载 Movielens 校验和-
25m.zip.md5’
3- 检查存档的校验和是否与下载的对应
4-如果是肯定的检查,打印下载的存档中包含的文件的名称
这是我写到现在的:
from zipfile import ZipFile
from urllib import request
import hashlib
def md5(fname):
hash_md5 = hashlib.md5()
with open(fname, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_md5.update(chunk)
return hash_md5.hexdigest()
url_datasets = 'http://files.grouplens.org/datasets/movielens/ml-25m.zip'
datasets = 'datasets.zip'
url_checksum = 'http://files.grouplens.org/datasets/movielens/ml-25m.zip.md5'
request.urlretrieve( url_datasets, datasets)
request.urlretrieve (url_checksum, checksum)
checksum = 'datasets.zip.md5'
with ZipFile(datasets, 'r') as zipObj:
listOfiles = zipObj.namelist()
for elem in listOfiles:
print(elem)
所以我缺少的是一种将我计算的校验和与我下载的校验和进行比较的方法,也许我可以创建一个函数“printFiles”来检查校验和,在肯定的情况下打印文件列表。
还有什么我可以改进的地方吗?
您的代码实际上并未发出任何请求。
from zipfile import ZipFile
import hashlib
import requests
def md5(fname):
hash_md5 = hashlib.md5()
hash_md5.update( open(fname,'rb').read() )
return hash_md5.hexdigest()
url_datasets = 'http://files.grouplens.org/datasets/movielens/ml-25m.zip'
datasets = 'datasets.zip'
url_checksum = 'http://files.grouplens.org/datasets/movielens/ml-25m.zip.md5'
checksum = 'datasets.zip.md5'
ds = requests.get( url_datasets, allow_redirects=True)
cs = requests.get( url_checksum, allow_redirects=True)
open( datasets, 'wb').write( ds.content )
ds_md5 = md5(datasets)
cs_md5 = cs.content.decode('utf-8').split()[0]
print( ds_md5 )
print( cs_md5 )
if ds_md5 == cs_md5:
print( "MATCH" )
with ZipFile(datasets, 'r') as zipObj:
listOfiles = zipObj.namelist()
for elem in listOfiles:
print(elem)
else:
print( "Checksum fail" )
我必须在 Python 中写入执行以下任务:
1- 从 url ‘http://files.grouplens.org/datasets/movielens/ml 下载 Movielens 数据集-
25m.zip'
2- 从 url ‘http://files.grouplens.org/datasets/movielens/ml 下载 Movielens 校验和-
25m.zip.md5’
3- 检查存档的校验和是否与下载的对应
4-如果是肯定的检查,打印下载的存档中包含的文件的名称
这是我写到现在的:
from zipfile import ZipFile
from urllib import request
import hashlib
def md5(fname):
hash_md5 = hashlib.md5()
with open(fname, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_md5.update(chunk)
return hash_md5.hexdigest()
url_datasets = 'http://files.grouplens.org/datasets/movielens/ml-25m.zip'
datasets = 'datasets.zip'
url_checksum = 'http://files.grouplens.org/datasets/movielens/ml-25m.zip.md5'
request.urlretrieve( url_datasets, datasets)
request.urlretrieve (url_checksum, checksum)
checksum = 'datasets.zip.md5'
with ZipFile(datasets, 'r') as zipObj:
listOfiles = zipObj.namelist()
for elem in listOfiles:
print(elem)
所以我缺少的是一种将我计算的校验和与我下载的校验和进行比较的方法,也许我可以创建一个函数“printFiles”来检查校验和,在肯定的情况下打印文件列表。
还有什么我可以改进的地方吗?
您的代码实际上并未发出任何请求。
from zipfile import ZipFile
import hashlib
import requests
def md5(fname):
hash_md5 = hashlib.md5()
hash_md5.update( open(fname,'rb').read() )
return hash_md5.hexdigest()
url_datasets = 'http://files.grouplens.org/datasets/movielens/ml-25m.zip'
datasets = 'datasets.zip'
url_checksum = 'http://files.grouplens.org/datasets/movielens/ml-25m.zip.md5'
checksum = 'datasets.zip.md5'
ds = requests.get( url_datasets, allow_redirects=True)
cs = requests.get( url_checksum, allow_redirects=True)
open( datasets, 'wb').write( ds.content )
ds_md5 = md5(datasets)
cs_md5 = cs.content.decode('utf-8').split()[0]
print( ds_md5 )
print( cs_md5 )
if ds_md5 == cs_md5:
print( "MATCH" )
with ZipFile(datasets, 'r') as zipObj:
listOfiles = zipObj.namelist()
for elem in listOfiles:
print(elem)
else:
print( "Checksum fail" )