从 Google 云存储加载保存的 CatBoost 模型 (.cbm)

Load Saved CatBoost Model (.cbm) from Google Cloud Storage

我正在尝试创建一个脚本,用于从 Cloud Storage 存储桶加载已保存的 CatBoost 模型,并使用它进行预测。但是,我无法成功加载文件。 CatBoost 抛出模型文件不存在的错误,尽管我已经直接从 UI.

复制了路径

我正在使用 Google 云平台。该脚本位于与存储模型的存储桶相同的项目中的 AI Platform JupyterLab 笔记本中。我用来进行预测的特征集存储在与模型相同的存储桶中,我能够成功地将特征集文件读入数据帧 (X_eval)。

我尝试同时使用 URI(“gs://...”)和经过身份验证的 URL(“https://...”),但都抛出相同的错误。

#Specify model path
path = 'gs://bucket_id/model-name'

# Load model
from_file = CatBoostClassifier()
model = from_file.load_model(path)

model.predict(X_eval)
---------------------------------------------------------------------------
CatBoostError                             Traceback (most recent call last)
<ipython-input-9-f7a6068f5718> in <module>
     70 
     71 if __name__ == "__main__":
---> 72     main('data','context')

<ipython-input-9-f7a6068f5718> in main(data, context)
     42     # Load model
     43     from_file = CatBoostClassifier()
---> 44     from_file.load_model(path)
     45 
     46     model.predict(X_eval)

/opt/conda/lib/python3.7/site-packages/catboost/core.py in load_model(self, fname, format, stream, blob)
   2655 
   2656         if fname is not None:
-> 2657             self._load_model(fname, format)
   2658         elif stream is not None:
   2659             self._load_from_stream(stream)

/opt/conda/lib/python3.7/site-packages/catboost/core.py in _load_model(self, model_file, format)
   1345             raise CatBoostError("Invalid fname type={}: must be str().".format(type(model_file)))
   1346 
-> 1347         self._object._load_model(model_file, format)
   1348         self._set_trained_model_attributes()
   1349         for key, value in iteritems(self._get_params()):

_catboost.pyx in _catboost._CatBoost._load_model()

_catboost.pyx in _catboost._CatBoost._load_model()

CatBoostError: catboost/libs/model/model_import_interface.h:19: Model file doesn't exist: gs://bucket_id/model-name

如果我将相同的模型文件上传到本地文件系统(例如,JupyterLabs 笔记本所在的 VM 的文件系统 运行),模型将成功加载。例如,这有效:

#Specify model path
path = 'model-name'

# Load model
from_file = CatBoostClassifier()
model = from_file.load_model(path)

model.predict(X_eval)

我使用了 Ture Friese 对以下问题的回答来解决这个问题:

这涉及使用 BytesIO 将文件下载到内存中的文件对象,然后从该文件对象加载模型,并使用它对数据帧进行预测 X_eval:

from io import BytesIO

storage_client = storage.Client()

# Storage variables
model_bucket_id = #Replace with your bucket ID
model_bucket = storage_client.get_bucket(model_bucket_id)
model_name = #Replace with the file name of the model

# Select bucket file
blob = model_bucket.blob(model_name)

# Download blob into an in-memory file object
model_file = BytesIO()
blob.download_to_filename(model_file)

# Load model from in-memory file object
from_file = CatBoostClassifier()
model = from_file.load_model(model_name)

model.predict(X_eval)

有一种更好的方法 - 似乎没有记录......

import catboost as cb
from google.cloud import storage

storage_client = storage.Client()

bucket_name = "catboost-models" # put your bucket name here
blob_name = "mymodel" # put the blob name from the bucket here

blob = storage_client.bucket( bucket_name ).blob( blob_name ).download_as_bytes()

model = cb.CatBoostClassifier()
model.load_model( blob = blob )