如何使用 python 将图像从 url 保存到 mongodb?

How to save images from url into mongodb using python?

我已经使用 wikipedia 包从任何维基百科页面获取图像 url 列表:

import wikipedia
et_page = wikipedia.page("Summer")
images = et_page.images

现在,我想将 images 变量中的所有图像保存到名为 images 的集合中的 mongodb。

import pymongo
from PIL import Image
import io

client = pymongo.MongoClient("mongodb+srv://<>:<>@cluster0.lfrg6.mongodb.net/myFirstDatabase?retryWrites=true&w=majority")

database_name = 'test'
database = client[database_name]

collection = 'images'
image_collection = database[collection]

有什么办法吗?由于有多张图片,可以将它们保存为列表格式吗?

最好不要使用 MongoDB 作为任意 blob 数据存储,尤其是。对于大图像。缩略图和小信息图表很好。但是 OP 试图了解如何 可以 完成,最好的方法是使用 gridFSgridFS 是 pymongo 环境的一部分,所以如果你可以 import pymongo 你可以 import gridfs。这是一个工作示例:

import wikipedia
import pymongo
import gridfs
from urllib.request import urlopen

connstr = "mongodb://yourInfoHere"
client = pymongo.MongoClient(connstr)

database = client.testX

# This will create two collections that are under control                                           
# of the gridfs object, images.chunks and images.files.  Do                                         
# not go to these collections directly; use the gridfs                                              
# methods instead. The choice of "images" is arbitrary; you
# can use any name you wish.  gridfs will add .chunks and .files
# to the real collection names.
#  Docs are here
#  https://pymongo.readthedocs.io/en/stable/api/gridfs/index.html#module-gridfs                                                                                  
gfs = gridfs.GridFS(database, collection="images")

page_name = "Summer"
print("capturing URLs to images on page",page_name)
et_page = wikipedia.page(page_name)
images = et_page.images

n = 0
for ii in images:
    print("processing",ii)
    f = urlopen(ii)
    # put() "inserts" the file-like object into the gfs subsystem                                   
    # and returns an ID.                                                                            
    file_id = gfs.put(f)

    # Make up a name and capture it AND the gridfs ID in a                                          
    # regular collection, called imageMeta here but it is                                           
    # any name you like.  It is not strictly necessary to do this
    # and it is completely separate from gridFS but you will almost 
    # always have a need to capture some metadata around the pix.                                                                            
    name = "IMAGE_" + str(n)
    database.imageMeta.insert_one({"name":name, "fileId":file_id})
    n += 1

# Here is an alternate solution where only 1 imageMeta doc is written                               
# but with arrays of image info.  You STILL need to push each image                                 
# individually into gridfs:                                                                         
n = 0
info = []
for ii in images:
    print("processing",ii)
    f = urlopen(ii)
    # put() "inserts" the file-like object into the gfs subsystem                                   
    # and returns an ID.                                                                            
    file_id = gfs.put(f)

    # Make up a name and capture it AND the gridfs ID in a                                          
    # regular collection, called imageMeta here but it is                                           
    # any name you like.                                                                            
    name = "IMAGE_" + str(n)
    info.append({"name":name, "fileId":file_id})
    n += 1

database.imageMeta.insert_one({"page":page_name, "imageInfo":info});


# Here is how you can get your images out.  Let's pick                                             
# IMAGE_0 for example but obviously any query criteria on the                                       
# imageMeta docs is valid:                                                                          
doc = database.imageMeta.find_one({"name":"IMAGE_0"});
gg = gfs.get(doc['fileId'])

with open('foo.jpg', 'wb+') as wf:
    wf.write(gg.read())  # Nice read/write slurp