有没有好的方法可以存储很多小的 PDF 文件？

Question

我的工作性质要求我制作大量我正在分析的数据的 PDF 图像。在一天结束时，我可能只使用 10% 的图像作为概念的“证明”，但我仍然想保存所有图像，以防人们想要仔细检查我的作品。

我正在考虑将 PDF 文件存储在 hdf5 文件中，但据我所知这是不可能的（我与 hdf5 文件的唯一接口是通过 python 中的 h5py 模块） .

大家有什么推荐的吗？

Answer 1

可以在一个 HDF5 文件中存储（许多）PDF 文件。解决此问题的一种方法是为每个 PDF 创建一个数据集，并使该数据集的数据类型为一维不透明，其大小等于 PDF 文件的大小。如果您不受特定技术的约束，您可以使用 HDFql 解决您的用例，如下所示：

# import HDFql package
import HDFql

cursor = HDFql.Cursor()

# create an HDF5 file named 'pdf.h5' and use (i.e. open) it
HDFql.execute("CREATE AND USE FILE pdf.h5")

# get all files contained in root directory '/my_dir' which, for the sake
# of this example, contains the PDFs to store in the HDF5 file
HDFql.execute("SHOW FILE /my_dir/")

i = 0
while HDFql.cursor_next() == HDFql.SUCCESS:

    # get name of PDF file
    file_name = HDFql.cursor_get_char()

    HDFql.cursor_use(cursor)

    # get size of PDF file
    HDFql.execute("SHOW FILE SIZE \"/my_dir/%s\"" % file_name)

    HDFql.cursor_next()

    file_size = HDFql.cursor_get_unsigned_bigint()

    HDFql.cursor_use_default()

    # create dataset containing the content of the PDF file
    HDFql.execute("CREATE DATASET dset_%d AS OPAQUE(%d) VALUES FROM BINARY FILE \"/my_dir/%s\"" % (i, file_size, file_name))

    i += 1

有没有好的方法可以存储很多小的 PDF 文件？

Is there a good way to store a lot of small PDF files?

pdf

hdf5