如何通过 joblib write/load 机器学习模型 to/from S3 bucket?
How to write/load machine learning model to/from S3 bucket through joblib?
我有一个 ml 模型,我想将其保存在 S3 存储桶中。
from lightgbm.sklearn import LGBMClassifier
# Initialize model
mdl_lightgbm = LGBMClassifier(boosting_type='rf', objective='binary')
# Fit data
mdl_lightgbm.fit(X,Y)
# Save model to dictionary
mdl_dict = {'mdl_fitted':mdl_lightgbm}
出于某些原因,我将拟合模型存储在字典中。这个想法是 dump/load 通过 joblib to/from 一个 S3 存储桶的模型。
将模型保存到 S3
基于此 的想法,以下函数可让您将模型保存到 s3 存储桶 或通过 joblib 在本地保存:
import boto3
from io import BytesIO
def write_joblib(file, path):
'''
Function to write a joblib file to an s3 bucket or local directory.
Arguments:
* file: The file that you want to save
* path: an s3 bucket or local directory path.
'''
# Path is an s3 bucket
if path[:5] == 's3://':
s3_bucket, s3_key = path.split('/')[2], path.split('/')[3:]
s3_key = '/'.join(s3_key)
with BytesIO() as f:
joblib.dump(file, f)
f.seek(0)
boto3.client("s3").upload_fileobj(Bucket=s3_bucket, Key=s3_key, Fileobj=f)
# Path is a local directory
else:
with open(path, 'wb') as f:
joblib.dump(file, f)
在您的示例中,如果您想将模型保存到 s3 存储桶,只需键入
write_joblib(mdl_dict, 's3://bucket_name/mdl_dict.joblib')
从 s3 加载模型
此外,按照这个 的想法,下面的函数让您 从 s3 存储桶 或本地文件
加载模型
def read_joblib(path):
'''
Function to load a joblib file from an s3 bucket or local directory.
Arguments:
* path: an s3 bucket or local directory path where the file is stored
Outputs:
* file: Joblib file loaded
'''
# Path is an s3 bucket
if path[:5] == 's3://':
s3_bucket, s3_key = path.split('/')[2], path.split('/')[3:]
s3_key = '/'.join(s3_key)
with BytesIO() as f:
boto3.client("s3").download_fileobj(Bucket=s3_bucket, Key=s3_key, Fileobj=f)
f.seek(0)
file = joblib.load(f)
# Path is a local directory
else:
with open(path, 'rb') as f:
file = joblib.load(f)
return file
在您的情况下,要从同一个 s3 存储桶加载文件,请使用以下代码行
mdl_lightgbm = read_joblib('s3://bucket_name/mdl_dict.joblib')
mdl_lightgbm = mdl_lightgbm['mdl_fitted']
我有一个 ml 模型,我想将其保存在 S3 存储桶中。
from lightgbm.sklearn import LGBMClassifier
# Initialize model
mdl_lightgbm = LGBMClassifier(boosting_type='rf', objective='binary')
# Fit data
mdl_lightgbm.fit(X,Y)
# Save model to dictionary
mdl_dict = {'mdl_fitted':mdl_lightgbm}
出于某些原因,我将拟合模型存储在字典中。这个想法是 dump/load 通过 joblib to/from 一个 S3 存储桶的模型。
将模型保存到 S3
基于此
import boto3
from io import BytesIO
def write_joblib(file, path):
'''
Function to write a joblib file to an s3 bucket or local directory.
Arguments:
* file: The file that you want to save
* path: an s3 bucket or local directory path.
'''
# Path is an s3 bucket
if path[:5] == 's3://':
s3_bucket, s3_key = path.split('/')[2], path.split('/')[3:]
s3_key = '/'.join(s3_key)
with BytesIO() as f:
joblib.dump(file, f)
f.seek(0)
boto3.client("s3").upload_fileobj(Bucket=s3_bucket, Key=s3_key, Fileobj=f)
# Path is a local directory
else:
with open(path, 'wb') as f:
joblib.dump(file, f)
在您的示例中,如果您想将模型保存到 s3 存储桶,只需键入
write_joblib(mdl_dict, 's3://bucket_name/mdl_dict.joblib')
从 s3 加载模型
此外,按照这个
def read_joblib(path):
'''
Function to load a joblib file from an s3 bucket or local directory.
Arguments:
* path: an s3 bucket or local directory path where the file is stored
Outputs:
* file: Joblib file loaded
'''
# Path is an s3 bucket
if path[:5] == 's3://':
s3_bucket, s3_key = path.split('/')[2], path.split('/')[3:]
s3_key = '/'.join(s3_key)
with BytesIO() as f:
boto3.client("s3").download_fileobj(Bucket=s3_bucket, Key=s3_key, Fileobj=f)
f.seek(0)
file = joblib.load(f)
# Path is a local directory
else:
with open(path, 'rb') as f:
file = joblib.load(f)
return file
在您的情况下,要从同一个 s3 存储桶加载文件,请使用以下代码行
mdl_lightgbm = read_joblib('s3://bucket_name/mdl_dict.joblib')
mdl_lightgbm = mdl_lightgbm['mdl_fitted']