"hybrid models" 的模型记录(例如 SKlearn 管道,包括 KerasWrapper)可能吗?
Model-logging for "hybrid models" (e.g. SKlearn Pipeline including KerasWrapper) possible?
我已将我的 keras-tf-model 包装到 Sklearn Pipeline 中,它也进行一些预处理和后处理。我想序列化这个模型并通过 MLflow 捕获它的依赖关系。
我试过mlflow.keras.save_model()
,好像不太合适。 (它不是 "pure" keras 模型并且没有 save()
属性)
我也尝试了 mlflow.sklearn.save_model()
和 mlflow.pyfunc.save_model()
,它们都导致了同样的错误:
NotImplementedError: numpy() is only available when eager execution is enabled.
(这个错误似乎源于 python 和 tensorflow 之间的冲突,也许吧?)
我想知道,是否已经/通常可以使用 mlflow 序列化这些 "hybrid" 模型?
请在下面找到一个最小的例子
# In[1]:
from mlflow.sklearn import save_model
import mlflow.sklearn
from sklearn.datasets import load_iris
from sklearn import tree
from tensorflow.keras.models import Sequential
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
# ### Save Keras Model
# In[2]:
iris_data = load_iris()
x = iris_data.data
y_ = iris_data.target.reshape(-1, 1)
# One Hot encode the class labels
encoder = OneHotEncoder(sparse=False)
y = encoder.fit_transform(y_)
# Split the data for training and testing
train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.20)
# Build the model
model = Sequential()
model.add(Dense(10, input_shape=(4,), activation='relu', name='fc1'))
model.add(Dense(10, activation='relu', name='fc2'))
model.add(Dense(3, activation='softmax', name='output'))
optimizer = Adam(lr=0.001)
model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(train_x, train_y, verbose=2, batch_size=5, epochs=20)
# In[3]:
import mlflow.keras
mlflow.keras.save_model(model, "modelstorage/model40")
# ### Save Minimal SKlearn-Pipeline (with Keras)
# In[4]:
from category_encoders.target_encoder import TargetEncoder
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from keras.wrappers.scikit_learn import KerasClassifier
# In[5]:
def define_model():
"""
Create fully connected network with given parameters.
"""
keras_model = Sequential()
keras_model.add(Dense(10, input_shape=(4,), activation='relu', name='fc1'))
keras_model.add(Dense(10, activation='relu', name='fc2'))
keras_model.add(Dense(3, activation='softmax', name='output'))
optimizer = Adam(lr=0.001)
keras_model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
return model
# In[6]:
# target_encoder = TargetEncoder()
scaler = StandardScaler()
keras_model = KerasClassifier(define_model, batch_size=5, epochs=20)
# In[7]:
pipeline = Pipeline([
# ('encoding', target_encoder),
('scaling', scaler),
('modeling', keras_model)
])
# In[8]:
pipeline.fit(train_x, train_y)
# In[9]:
mlflow.keras.save_model(pipeline, "modelstorage/model42") #not working
# In[10]:
import mlflow.sklearn
mlflow.sklearn.save_model(pipeline, "modelstorage/model43")
Output from modelstorage/model43/conda.yaml:
======================
channels:
- defaults
dependencies:
- python=3.6.7
- scikit-learn=0.21.2
- pip:
- mlflow
- cloudpickle==1.2.1
name: mlflow-env
======================
Doesn't seem to capture Tensorflow.
您可以在保存模型时添加额外的依赖项,例如,如果您的管道中有一个 keras 步骤,您可以添加 keras 和 tensorflow:
conda_env = mlflow.sklearn.get_default_conda_env()
conda_env["dependencies"] = ['keras==2.2.4', 'tensorflow==1.14.0'] + conda_env["dependencies"]
mlflow.sklearn.log_model(pipeline, "modelstorage/model43", conda_env = conda_env)
我已将我的 keras-tf-model 包装到 Sklearn Pipeline 中,它也进行一些预处理和后处理。我想序列化这个模型并通过 MLflow 捕获它的依赖关系。
我试过mlflow.keras.save_model()
,好像不太合适。 (它不是 "pure" keras 模型并且没有 save()
属性)
我也尝试了 mlflow.sklearn.save_model()
和 mlflow.pyfunc.save_model()
,它们都导致了同样的错误:
NotImplementedError: numpy() is only available when eager execution is enabled.
(这个错误似乎源于 python 和 tensorflow 之间的冲突,也许吧?)
我想知道,是否已经/通常可以使用 mlflow 序列化这些 "hybrid" 模型?
请在下面找到一个最小的例子
# In[1]:
from mlflow.sklearn import save_model
import mlflow.sklearn
from sklearn.datasets import load_iris
from sklearn import tree
from tensorflow.keras.models import Sequential
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
# ### Save Keras Model
# In[2]:
iris_data = load_iris()
x = iris_data.data
y_ = iris_data.target.reshape(-1, 1)
# One Hot encode the class labels
encoder = OneHotEncoder(sparse=False)
y = encoder.fit_transform(y_)
# Split the data for training and testing
train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.20)
# Build the model
model = Sequential()
model.add(Dense(10, input_shape=(4,), activation='relu', name='fc1'))
model.add(Dense(10, activation='relu', name='fc2'))
model.add(Dense(3, activation='softmax', name='output'))
optimizer = Adam(lr=0.001)
model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(train_x, train_y, verbose=2, batch_size=5, epochs=20)
# In[3]:
import mlflow.keras
mlflow.keras.save_model(model, "modelstorage/model40")
# ### Save Minimal SKlearn-Pipeline (with Keras)
# In[4]:
from category_encoders.target_encoder import TargetEncoder
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from keras.wrappers.scikit_learn import KerasClassifier
# In[5]:
def define_model():
"""
Create fully connected network with given parameters.
"""
keras_model = Sequential()
keras_model.add(Dense(10, input_shape=(4,), activation='relu', name='fc1'))
keras_model.add(Dense(10, activation='relu', name='fc2'))
keras_model.add(Dense(3, activation='softmax', name='output'))
optimizer = Adam(lr=0.001)
keras_model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
return model
# In[6]:
# target_encoder = TargetEncoder()
scaler = StandardScaler()
keras_model = KerasClassifier(define_model, batch_size=5, epochs=20)
# In[7]:
pipeline = Pipeline([
# ('encoding', target_encoder),
('scaling', scaler),
('modeling', keras_model)
])
# In[8]:
pipeline.fit(train_x, train_y)
# In[9]:
mlflow.keras.save_model(pipeline, "modelstorage/model42") #not working
# In[10]:
import mlflow.sklearn
mlflow.sklearn.save_model(pipeline, "modelstorage/model43")
Output from modelstorage/model43/conda.yaml:
======================
channels:
- defaults
dependencies:
- python=3.6.7
- scikit-learn=0.21.2
- pip:
- mlflow
- cloudpickle==1.2.1
name: mlflow-env
======================
Doesn't seem to capture Tensorflow.
您可以在保存模型时添加额外的依赖项,例如,如果您的管道中有一个 keras 步骤,您可以添加 keras 和 tensorflow:
conda_env = mlflow.sklearn.get_default_conda_env()
conda_env["dependencies"] = ['keras==2.2.4', 'tensorflow==1.14.0'] + conda_env["dependencies"]
mlflow.sklearn.log_model(pipeline, "modelstorage/model43", conda_env = conda_env)