如何从破折号下拉菜单中 select 和 运行 建模并更新混淆矩阵图?

How to select and run model from dash dropdown menu and update confusion matrix figure?

我正在基于此breast cancer dataset构建一个 ML 预测破折号应用程序。

从下拉菜单中,我希望能够 select 我的一个模型、运行 拟合和 return 更新的混淆矩阵(热图)。

我计划将脚本扩展到表、roc 曲线、学习曲线等(即 multi output callback )——但首先我希望这部分工作,然后再实现其他元素。

我尝试过不同的东西。

例如,在当前代码(下方)之前,我尝试直接从下拉菜单中调用模型,然后在回调中进行所有 cm 计算,导致 AttributeError : 'str' 对象没有属性 'fit':

@app.callback(Output('conf_matrix', 'figure'), [Input('dropdown-5', 'value')])
def update_cm_matix(model):
    class_names=[0,1]
    fitModel = model.fit(X_train, y_train)
    y_pred = fitModel.predict(X_test)
    cm = confusion_matrix(y_test, y_pred)
    return {'data': [go.Heatmap(x=class_names, y=class_names, z=cm, showscale=True, colorscale='blues')],
            'layout': dict(width=350, height=280, margin={'t': 10},
                       xaxis=dict(title='Predicted class', tickvals=[0, 1]),
                       yaxis=dict(title='True class', tickvals=[0, 1], autorange='reversed'))}

(替换下面脚本中的 app.callback 和函数)。

我正在努力的当前版本是:

# -*- coding: utf-8 -*-
import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_bootstrap_components as dbc
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.feature_selection import RFE
import plotly.graph_objs as go
from dash.dependencies import Input, Output

app = dash.Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])
server = app.server

app.config.suppress_callback_exceptions = True

df = pd.read_csv("breast_cancer.csv")
y = np.array(df.diagnosis.tolist())
data = df.drop('diagnosis', 1)
X = np.array(data.values)

scaler = StandardScaler()
X = scaler.fit_transform(X)

random_state = 42
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=random_state)

# First model: logistic model + optimize hyperparameters
log = LogisticRegression(random_state=random_state)
param_grid = {'penalty': ['l2', 'l1'], 'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000]}
CV_log = GridSearchCV(estimator=log, param_grid=param_grid,, scoring='accuracy', verbose=1, n_jobs=-1)
CV_log.fit(X_train, y_train)
log_best_params = CV_log.best_params_
log_clf = LogisticRegression(C=log_best_params['C'], penalty=log_best_params['penalty'], random_state=random_state)

# Second model: logistic model with recursive features elimination (just for illustration purposes, other models will be included)
rfe_selector = RFE(log_clf)

# app layout
app.layout = html.Div([
    html.Div([
        dcc.Dropdown(
            id='dropdown-5',
            options=[{'label': 'Logistic', 'value': 'log_clf'},
                     {'label': 'RFE', 'value': 'rfe_selector'}],
            value='log_clf',
            style={'width': '150px', 'height': '35px', 'fontSize': '10pt'}
        )], style={}),

    html.Div([
        dcc.Graph(id='conf_matrix')
    ])
])

# function to run selected model
def ClassTrainEval(model):
    fitModel = model.fit(X_train, y_train)
    y_pred = fitModel.predict(X_test)
    cm = confusion_matrix(y_test, y_pred)
    return fitModel, y_pred, y_score, cm

models = [log_clf, rfe_selector]
class_names = [0,1]

# dash callback
@app.callback(Output('conf_matrix', 'figure'), [Input('dropdown-5', 'value')])
def update_cm_matix(model):
    for model in models:
        ClassTrainEval(model)
    return {'data': [go.Heatmap(x=class_names, y=class_names, z=cm, showscale=True, colorscale='blues')],
            'layout': dict(width=350, height=280, margin={'t': 10},
                           xaxis=dict(title='Predicted class', tickvals=[0, 1]),
                           yaxis=dict(title='True class', tickvals=[0, 1], autorange='reversed'))}

if __name__ == '__main__':
    app.run_server(debug=True)

我得到一个 NameError: name 'cm' is not defined 错误。

我不太确定如何前进才能让它发挥作用 - 所以我希望有人能给我指出正确的方向。

谢谢!

你的代码有多个错误。让我们先算出你的两次尝试。

dcc.Dropdown(
        id='dropdown-5',
        options=[{'label': 'Logistic', 'value': 'log_clf'},
                 {'label': 'RFE', 'value': 'rfe_selector'}],
        value='log_clf',
        style={'width': '150px', 'height': '35px', 'fontSize': '10pt'}
    )], style={})

在您的下拉列表中,模型是一个字符串 (type('log_clf') == str),因此您无法对其进行训练。您需要按如下方式编写回调:

models = {'Logistic':log_clf, 'RFE':rfe_selector}
""""i jumped some line of code"""
dcc.Dropdown(
        id='dropdown-5',
        options=[{'label': v, 'value': v} for v in ['Logistic','RFE']],
        value='Logistic',
        style={'width': '150px', 'height': '35px', 'fontSize': '10pt'}
    )

对于第二次尝试,您还需要一行来适应我所做的更改:

错误是:NameError: name 'cm' is not defined error(我假设它发生在回调中)并且正在发生,因为您没有将函数的输出分配给变量:

函数是

# function to run selected model
def ClassTrainEval(model):
    fitModel = model.fit(X_train, y_train)
    y_pred = fitModel.predict(X_test)
    cm = confusion_matrix(y_test, y_pred)
    return fitModel, y_pred, y_score, cm #Note that y_score is never defined so you need to remove this 

然后在回调中你有:

# dash callback
@app.callback(Output('conf_matrix', 'figure'), [Input('dropdown-5', 'value')])
def update_cm_matix(model):
    for model in models: #<-------No loop needed
        ClassTrainEval(model) #<-------Here You need to assigne the output
    return {'data': [go.Heatmap(x=class_names, y=class_names, z=cm, showscale=True, colorscale='blues')],
            'layout': dict(width=350, height=280, margin={'t': 10},
                           xaxis=dict(title='Predicted class', tickvals=[0, 1]),
                           yaxis=dict(title='True class', tickvals=[0, 1], autorange='reversed'))}

你可能想写:

@app.callback(Output('conf_matrix', 'figure'), [Input('dropdown-5', 'value')])
def update_cm_matix(v):
    model = models[v]
    fitModel, y_pred, cm =  ClassTrainEval(model)
    return {'data': [go.Heatmap(x=class_names, y=class_names, z=cm, showscale=True, colorscale='blues')],
            'layout': dict(width=350, height=280, margin={'t': 10},
                           xaxis=dict(title='Predicted class', tickvals=[0, 1]),
                           yaxis=dict(title='True class', tickvals=[0, 1], autorange='reversed'))}