如何使用 python 将图像从 url 保存到 mongodb?
How to save images from url into mongodb using python?
我已经使用 wikipedia 包从任何维基百科页面获取图像 url 列表:
import wikipedia
et_page = wikipedia.page("Summer")
images = et_page.images
现在,我想将 images 变量中的所有图像保存到名为 images 的集合中的 mongodb。
import pymongo
from PIL import Image
import io
client = pymongo.MongoClient("mongodb+srv://<>:<>@cluster0.lfrg6.mongodb.net/myFirstDatabase?retryWrites=true&w=majority")
database_name = 'test'
database = client[database_name]
collection = 'images'
image_collection = database[collection]
有什么办法吗?由于有多张图片,可以将它们保存为列表格式吗?
最好不要使用 MongoDB 作为任意 blob 数据存储,尤其是。对于大图像。缩略图和小信息图表很好。但是 OP 试图了解如何 可以 完成,最好的方法是使用 gridFS
。 gridFS
是 pymongo 环境的一部分,所以如果你可以 import pymongo
你可以 import gridfs
。这是一个工作示例:
import wikipedia
import pymongo
import gridfs
from urllib.request import urlopen
connstr = "mongodb://yourInfoHere"
client = pymongo.MongoClient(connstr)
database = client.testX
# This will create two collections that are under control
# of the gridfs object, images.chunks and images.files. Do
# not go to these collections directly; use the gridfs
# methods instead. The choice of "images" is arbitrary; you
# can use any name you wish. gridfs will add .chunks and .files
# to the real collection names.
# Docs are here
# https://pymongo.readthedocs.io/en/stable/api/gridfs/index.html#module-gridfs
gfs = gridfs.GridFS(database, collection="images")
page_name = "Summer"
print("capturing URLs to images on page",page_name)
et_page = wikipedia.page(page_name)
images = et_page.images
n = 0
for ii in images:
print("processing",ii)
f = urlopen(ii)
# put() "inserts" the file-like object into the gfs subsystem
# and returns an ID.
file_id = gfs.put(f)
# Make up a name and capture it AND the gridfs ID in a
# regular collection, called imageMeta here but it is
# any name you like. It is not strictly necessary to do this
# and it is completely separate from gridFS but you will almost
# always have a need to capture some metadata around the pix.
name = "IMAGE_" + str(n)
database.imageMeta.insert_one({"name":name, "fileId":file_id})
n += 1
# Here is an alternate solution where only 1 imageMeta doc is written
# but with arrays of image info. You STILL need to push each image
# individually into gridfs:
n = 0
info = []
for ii in images:
print("processing",ii)
f = urlopen(ii)
# put() "inserts" the file-like object into the gfs subsystem
# and returns an ID.
file_id = gfs.put(f)
# Make up a name and capture it AND the gridfs ID in a
# regular collection, called imageMeta here but it is
# any name you like.
name = "IMAGE_" + str(n)
info.append({"name":name, "fileId":file_id})
n += 1
database.imageMeta.insert_one({"page":page_name, "imageInfo":info});
# Here is how you can get your images out. Let's pick
# IMAGE_0 for example but obviously any query criteria on the
# imageMeta docs is valid:
doc = database.imageMeta.find_one({"name":"IMAGE_0"});
gg = gfs.get(doc['fileId'])
with open('foo.jpg', 'wb+') as wf:
wf.write(gg.read()) # Nice read/write slurp
我已经使用 wikipedia 包从任何维基百科页面获取图像 url 列表:
import wikipedia
et_page = wikipedia.page("Summer")
images = et_page.images
现在,我想将 images 变量中的所有图像保存到名为 images 的集合中的 mongodb。
import pymongo
from PIL import Image
import io
client = pymongo.MongoClient("mongodb+srv://<>:<>@cluster0.lfrg6.mongodb.net/myFirstDatabase?retryWrites=true&w=majority")
database_name = 'test'
database = client[database_name]
collection = 'images'
image_collection = database[collection]
有什么办法吗?由于有多张图片,可以将它们保存为列表格式吗?
最好不要使用 MongoDB 作为任意 blob 数据存储,尤其是。对于大图像。缩略图和小信息图表很好。但是 OP 试图了解如何 可以 完成,最好的方法是使用 gridFS
。 gridFS
是 pymongo 环境的一部分,所以如果你可以 import pymongo
你可以 import gridfs
。这是一个工作示例:
import wikipedia
import pymongo
import gridfs
from urllib.request import urlopen
connstr = "mongodb://yourInfoHere"
client = pymongo.MongoClient(connstr)
database = client.testX
# This will create two collections that are under control
# of the gridfs object, images.chunks and images.files. Do
# not go to these collections directly; use the gridfs
# methods instead. The choice of "images" is arbitrary; you
# can use any name you wish. gridfs will add .chunks and .files
# to the real collection names.
# Docs are here
# https://pymongo.readthedocs.io/en/stable/api/gridfs/index.html#module-gridfs
gfs = gridfs.GridFS(database, collection="images")
page_name = "Summer"
print("capturing URLs to images on page",page_name)
et_page = wikipedia.page(page_name)
images = et_page.images
n = 0
for ii in images:
print("processing",ii)
f = urlopen(ii)
# put() "inserts" the file-like object into the gfs subsystem
# and returns an ID.
file_id = gfs.put(f)
# Make up a name and capture it AND the gridfs ID in a
# regular collection, called imageMeta here but it is
# any name you like. It is not strictly necessary to do this
# and it is completely separate from gridFS but you will almost
# always have a need to capture some metadata around the pix.
name = "IMAGE_" + str(n)
database.imageMeta.insert_one({"name":name, "fileId":file_id})
n += 1
# Here is an alternate solution where only 1 imageMeta doc is written
# but with arrays of image info. You STILL need to push each image
# individually into gridfs:
n = 0
info = []
for ii in images:
print("processing",ii)
f = urlopen(ii)
# put() "inserts" the file-like object into the gfs subsystem
# and returns an ID.
file_id = gfs.put(f)
# Make up a name and capture it AND the gridfs ID in a
# regular collection, called imageMeta here but it is
# any name you like.
name = "IMAGE_" + str(n)
info.append({"name":name, "fileId":file_id})
n += 1
database.imageMeta.insert_one({"page":page_name, "imageInfo":info});
# Here is how you can get your images out. Let's pick
# IMAGE_0 for example but obviously any query criteria on the
# imageMeta docs is valid:
doc = database.imageMeta.find_one({"name":"IMAGE_0"});
gg = gfs.get(doc['fileId'])
with open('foo.jpg', 'wb+') as wf:
wf.write(gg.read()) # Nice read/write slurp