如何使用图像哈希作为下载图像的文件名?
How to use image hash as filename for downloaded images?
在 Python 中,我想将图像保存到文件中。文件名应该是哈希值,由 imagehash.average_hash()
生成。使用 ls -l
我看到了文件,但它们是空的:
-rw-r--r-- 1 lorem lorem 0 8 Sep 16:20 c4c0bcb49890bcfc.jpg
-rwxr-xr-x 1 lorem lorem 837 8 Sep 16:19 minimal.py
代码:
import requests
from PIL import Image
import imagehash
import shutil
def safe_to_file(url):
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}
image_hash = ''
r = requests.get(url, headers=headers, timeout=10, stream=True)
try:
if r.status_code == 200:
image_hash = str(imagehash.average_hash(Image.open(r.raw))) + '.jpg'
print(image_hash)
with open(image_hash, 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
except Exception as ex:
print(str(ex))
finally:
return image_hash
# Random jpg picture
url = 'https://cdn.ebaumsworld.com/mediaFiles/picture/1035099/85708057.jpg'
safe_to_file(url)
我希望图像不为空。我做错了什么?
正如我所怀疑的,PIL.Image
对象的创建消耗并下载了 url 中的所有图像数据,因此 shutil.copyfileobj()
没有任何东西可以消耗。
下面的代码似乎通过使用所需的基于哈希的文件名显式保存 Image
对象来避免该问题。我添加了评论以表明重大变化。
import imagehash
from PIL import Image
import requests
#import shutil
def safe_to_file(url):
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/53.0.2785.143 Safari/537.36'}
image_hash = ''
r = requests.get(url, headers=headers, timeout=10, stream=True)
try:
if r.status_code == 200:
img = Image.open(r.raw) # ADDED
image_hash = str(imagehash.average_hash(img)) + '.jpg' # CHANGED.
print('saving image:', image_hash)
img.save(image_hash) # ADDED
# with open(image_hash, 'wb') as f: # REMOVED
# r.raw.decode_content = True # REMOVED
# shutil.copyfileobj(r.raw, f) # REMOVED
except Exception as ex:
print(str(ex))
finally:
return image_hash
# Random jpg picture
url = 'https://cdn.ebaumsworld.com/mediaFiles/picture/1035099/85708057.jpg'
safe_to_file(url)
c4c0bcb49890bcfc.jpg
它创建的文件:
在 Python 中,我想将图像保存到文件中。文件名应该是哈希值,由 imagehash.average_hash()
生成。使用 ls -l
我看到了文件,但它们是空的:
-rw-r--r-- 1 lorem lorem 0 8 Sep 16:20 c4c0bcb49890bcfc.jpg
-rwxr-xr-x 1 lorem lorem 837 8 Sep 16:19 minimal.py
代码:
import requests
from PIL import Image
import imagehash
import shutil
def safe_to_file(url):
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}
image_hash = ''
r = requests.get(url, headers=headers, timeout=10, stream=True)
try:
if r.status_code == 200:
image_hash = str(imagehash.average_hash(Image.open(r.raw))) + '.jpg'
print(image_hash)
with open(image_hash, 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
except Exception as ex:
print(str(ex))
finally:
return image_hash
# Random jpg picture
url = 'https://cdn.ebaumsworld.com/mediaFiles/picture/1035099/85708057.jpg'
safe_to_file(url)
我希望图像不为空。我做错了什么?
正如我所怀疑的,PIL.Image
对象的创建消耗并下载了 url 中的所有图像数据,因此 shutil.copyfileobj()
没有任何东西可以消耗。
下面的代码似乎通过使用所需的基于哈希的文件名显式保存 Image
对象来避免该问题。我添加了评论以表明重大变化。
import imagehash
from PIL import Image
import requests
#import shutil
def safe_to_file(url):
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/53.0.2785.143 Safari/537.36'}
image_hash = ''
r = requests.get(url, headers=headers, timeout=10, stream=True)
try:
if r.status_code == 200:
img = Image.open(r.raw) # ADDED
image_hash = str(imagehash.average_hash(img)) + '.jpg' # CHANGED.
print('saving image:', image_hash)
img.save(image_hash) # ADDED
# with open(image_hash, 'wb') as f: # REMOVED
# r.raw.decode_content = True # REMOVED
# shutil.copyfileobj(r.raw, f) # REMOVED
except Exception as ex:
print(str(ex))
finally:
return image_hash
# Random jpg picture
url = 'https://cdn.ebaumsworld.com/mediaFiles/picture/1035099/85708057.jpg'
safe_to_file(url)
c4c0bcb49890bcfc.jpg
它创建的文件: