如何从 Google Cloud Storage 存储桶加载保存在 joblib 文件中的模型
How to load a model saved in joblib file from Google Cloud Storage bucket
我想加载一个模型,该模型从 Google Cloud Storage 存储桶中保存为 joblib 文件。当它在本地路径时,我们可以这样加载它(考虑到model_file是系统中的完整路径):
loaded_model = joblib.load(model_file)
我们如何使用 Google Cloud Storage 完成相同的任务?
我不认为这是可能的,至少以直接的方式。我虽然有一个解决方法,但可能没有您想要的那么有效。
通过使用Google 云存储客户端库[1],您可以先下载模型文件,加载它,当您的程序结束时,将其删除。当然,这意味着您每次 运行 代码都需要下载文件。这是一个片段:
from google.cloud import storage
from sklearn.externals import joblib
storage_client = storage.Client()
bucket_name=<bucket name>
model_bucket='model.joblib'
model_local='local.joblib'
bucket = storage_client.get_bucket(bucket_name)
#select bucket file
blob = bucket.blob(model_bucket)
#download that file and name it 'local.joblib'
blob.download_to_filename(model_local)
#load that file from local file
job=joblib.load(model_local)
对于在谷歌上搜索此问题答案的任何人。
除了显而易见的选项之外,还有两个选项可以使用 Google AI 平台进行模型托管(和在线预测)。
选项 1 是像这样使用 TemporaryFile:
from google.cloud import storage
from sklearn.externals import joblib
from tempfile import TemporaryFile
storage_client = storage.Client()
bucket_name=<bucket name>
model_bucket='model.joblib'
bucket = storage_client.get_bucket(bucket_name)
#select bucket file
blob = bucket.blob(model_bucket)
with TemporaryFile() as temp_file:
#download blob into temp file
blob.download_to_file(temp_file)
temp_file.seek(0)
#load into joblib
model=joblib.load(temp_file)
#use the model
model.predict(...)
选项 2 是像这样使用 BytesIO:
from google.cloud import storage
from sklearn.externals import joblib
from io import BytesIO
storage_client = storage.Client()
bucket_name=<bucket name>
model_bucket='model.joblib'
bucket = storage_client.get_bucket(bucket_name)
#select bucket file
blob = bucket.blob(model_bucket)
#download blob into an in-memory file object
model_file = BytesIO()
blob.download_to_file(model_file)
#load into joblib
model=joblib.load(model_local)
截至 2020 年使用 tf2 的备选答案,您可以这样做:
import joblib
import tensorflow as tf
gcs_path = 'gs://yourpathtofile'
loaded_model = joblib.load(tf.io.gfile.GFile(gcs_path, 'rb'))
对于在谷歌上搜索这个问题的人们 - 这里有另一种选择。 open source modelstore library 是一个包装器,用于处理从 Google Cloud Storage 保存、上传和下载模型的过程。
在后台,它使用 joblib 保存 scikit-learn 模型,使用文件创建 tar 存档,并使用 [=] 从 Google Cloud Storage 存储桶中 up/downloads 它们11=] 和 blob.download_to_filename()
.
实际上它看起来有点像这样(full example is here):
# Create modelstore instance
from modelstore import ModelStore
ModelStore.from_gcloud(
os.environ["GCP_PROJECT_ID"], # Your GCP project ID
os.environ["GCP_BUCKET_NAME"], # Your Cloud Storage bucket name
)
# Train and upload a model (this currently works with 9 different ML frameworks)
model = train() # Replace with your code to train a model
meta_data = modelstore.sklearn.upload("my-model-domain", model=model)
# ... and later when you want to download it
model_path = modelstore.download(
local_path="/path/to/a/directory",
domain="my-model-domain",
model_id=meta_data["model"]["model_id"],
)
我发现使用 gcsfs
是最快(也是最紧凑)的方法:
def load_joblib(bucket_name, file_name):
fs = gcsfs.GCSFileSystem()
with fs.open(f'{bucket_name}/{file_name}') as f:
return joblib.load(f)
我想加载一个模型,该模型从 Google Cloud Storage 存储桶中保存为 joblib 文件。当它在本地路径时,我们可以这样加载它(考虑到model_file是系统中的完整路径):
loaded_model = joblib.load(model_file)
我们如何使用 Google Cloud Storage 完成相同的任务?
我不认为这是可能的,至少以直接的方式。我虽然有一个解决方法,但可能没有您想要的那么有效。
通过使用Google 云存储客户端库[1],您可以先下载模型文件,加载它,当您的程序结束时,将其删除。当然,这意味着您每次 运行 代码都需要下载文件。这是一个片段:
from google.cloud import storage
from sklearn.externals import joblib
storage_client = storage.Client()
bucket_name=<bucket name>
model_bucket='model.joblib'
model_local='local.joblib'
bucket = storage_client.get_bucket(bucket_name)
#select bucket file
blob = bucket.blob(model_bucket)
#download that file and name it 'local.joblib'
blob.download_to_filename(model_local)
#load that file from local file
job=joblib.load(model_local)
对于在谷歌上搜索此问题答案的任何人。 除了显而易见的选项之外,还有两个选项可以使用 Google AI 平台进行模型托管(和在线预测)。
选项 1 是像这样使用 TemporaryFile:
from google.cloud import storage
from sklearn.externals import joblib
from tempfile import TemporaryFile
storage_client = storage.Client()
bucket_name=<bucket name>
model_bucket='model.joblib'
bucket = storage_client.get_bucket(bucket_name)
#select bucket file
blob = bucket.blob(model_bucket)
with TemporaryFile() as temp_file:
#download blob into temp file
blob.download_to_file(temp_file)
temp_file.seek(0)
#load into joblib
model=joblib.load(temp_file)
#use the model
model.predict(...)
选项 2 是像这样使用 BytesIO:
from google.cloud import storage
from sklearn.externals import joblib
from io import BytesIO
storage_client = storage.Client()
bucket_name=<bucket name>
model_bucket='model.joblib'
bucket = storage_client.get_bucket(bucket_name)
#select bucket file
blob = bucket.blob(model_bucket)
#download blob into an in-memory file object
model_file = BytesIO()
blob.download_to_file(model_file)
#load into joblib
model=joblib.load(model_local)
截至 2020 年使用 tf2 的备选答案,您可以这样做:
import joblib
import tensorflow as tf
gcs_path = 'gs://yourpathtofile'
loaded_model = joblib.load(tf.io.gfile.GFile(gcs_path, 'rb'))
对于在谷歌上搜索这个问题的人们 - 这里有另一种选择。 open source modelstore library 是一个包装器,用于处理从 Google Cloud Storage 保存、上传和下载模型的过程。
在后台,它使用 joblib 保存 scikit-learn 模型,使用文件创建 tar 存档,并使用 [=] 从 Google Cloud Storage 存储桶中 up/downloads 它们11=] 和 blob.download_to_filename()
.
实际上它看起来有点像这样(full example is here):
# Create modelstore instance
from modelstore import ModelStore
ModelStore.from_gcloud(
os.environ["GCP_PROJECT_ID"], # Your GCP project ID
os.environ["GCP_BUCKET_NAME"], # Your Cloud Storage bucket name
)
# Train and upload a model (this currently works with 9 different ML frameworks)
model = train() # Replace with your code to train a model
meta_data = modelstore.sklearn.upload("my-model-domain", model=model)
# ... and later when you want to download it
model_path = modelstore.download(
local_path="/path/to/a/directory",
domain="my-model-domain",
model_id=meta_data["model"]["model_id"],
)
我发现使用 gcsfs
是最快(也是最紧凑)的方法:
def load_joblib(bucket_name, file_name):
fs = gcsfs.GCSFileSystem()
with fs.open(f'{bucket_name}/{file_name}') as f:
return joblib.load(f)