如何从 cifar10 创建 lmdb 文件 data_batch.bin
How to create lmdb file from cifar10 data_batch.bin
我能够将二进制格式 (cifar 10 data_batch1.bin) 读入 python 中的 numpy 矩阵,但我很难将其写入 lmdb 文件。能给我指路吗?
几个月前我运行遇到了同样的问题。以下资源对我帮助很大:
- http://deepdish.io/2015/04/28/creating-lmdb-in-python/
- http://research.beenfrog.com/code/2015/05/04/write-leveldb-lmdb-using-python.html
- https://github.com/BVLC/caffe/issues/1698#issuecomment-70211045.
如果我没记错的话,下面的代码对我有用(使用 uint,8 位数据):
import lmdb
import caffe
# Let images be a N x 3 x H x W matrix, i.e. N samples,
# 3 color channels (in BGR) height H and width W;
# you will need to get your images into the above
# blob shape (i.e. samples x channels x height x width).
# Let labels be a N x 1 matrix containing the labels.
env = lmdb.open('lmdb_path', map_size = X.nbytes * 10)
with env.begin(write = True) as txn:
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = images.shape[1]
datum.height = images.shape[2]
datum.width = images.shape[3]
datum.data = images[i].tostring()
label = int(labels[i])
datum.label = label
# Alternatively, use:
# datum = caffe.io.array_to_datum(images[i], label)
str_id = '{:08}'.format(i)
# You might need to check whether the encode is necessary in Python 2.7, I used Python 3:
txn.put(str_id.encode('ascii'), datum.SerializeToString())
确保您的图像使用 BGR 颜色 space:https://github.com/BVLC/caffe/wiki/Image-Format:-BGR-not-RGB。
我能够将二进制格式 (cifar 10 data_batch1.bin) 读入 python 中的 numpy 矩阵,但我很难将其写入 lmdb 文件。能给我指路吗?
几个月前我运行遇到了同样的问题。以下资源对我帮助很大:
- http://deepdish.io/2015/04/28/creating-lmdb-in-python/
- http://research.beenfrog.com/code/2015/05/04/write-leveldb-lmdb-using-python.html
- https://github.com/BVLC/caffe/issues/1698#issuecomment-70211045.
如果我没记错的话,下面的代码对我有用(使用 uint,8 位数据):
import lmdb
import caffe
# Let images be a N x 3 x H x W matrix, i.e. N samples,
# 3 color channels (in BGR) height H and width W;
# you will need to get your images into the above
# blob shape (i.e. samples x channels x height x width).
# Let labels be a N x 1 matrix containing the labels.
env = lmdb.open('lmdb_path', map_size = X.nbytes * 10)
with env.begin(write = True) as txn:
for i in range(N):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = images.shape[1]
datum.height = images.shape[2]
datum.width = images.shape[3]
datum.data = images[i].tostring()
label = int(labels[i])
datum.label = label
# Alternatively, use:
# datum = caffe.io.array_to_datum(images[i], label)
str_id = '{:08}'.format(i)
# You might need to check whether the encode is necessary in Python 2.7, I used Python 3:
txn.put(str_id.encode('ascii'), datum.SerializeToString())
确保您的图像使用 BGR 颜色 space:https://github.com/BVLC/caffe/wiki/Image-Format:-BGR-not-RGB。