使用 joblib.Memory 在 AWS S3 中缓存数据

Using joblib.Memory to cache data in AWS S3

是否可以在 AWS S3 中使用 joblib.Memory 缓存函数输出,例如将远程 link 传递给 cachedir 参数?

例如:

s3_url = 'https://foo.s3..../folder/cache_folder/project_name/joblib'
from joblib import Memory
memory = Memory(s3_url, verbose=0)

@memory.cache
def my_function(x): return x

试试这个库:

https://github.com/aabadie/joblib-s3

来自他们的文档:

获取最新代码

要获取最新代码,请使用 git:

git clone git://github.com/aabadie/joblib-s3.git

正在安装 joblibs3

只需使用 pip:

$ cd joblib-s3
$ pip install -r requirements.txt .

使用joblibs3在AWS S3中缓存计算结果

参见以下示例:

import numpy as np
from joblib import Memory
from joblibs3 import register_s3_store_backend

if __name__ == '__main__':
    register_s3_store_backend()

    # we assume you S3 credentials are stored in ~/.aws/credentials, so no
    # need to pass them to Memory constructor.
    mem = Memory('joblib_cache', backend='s3', verbose=100, compress=True,
                 backend_options=dict(bucket="joblib-example"))

    multiply = mem.cache(np.multiply)
    array1 = np.arange(10000)
    array2 = np.arange(10000)

    result = multiply(array1, array2)
    print(result)