使用 predict_contributions Python 中 H2O 的负 SHAP 值
Negative SHAP values in H2O in Python using predict_contributions
我一直在尝试计算 Python 中 H2O 模块中梯度提升分类器的 SHAP 值。下面是 predict_contibutions
方法文档中的改编示例(改编自 https://github.com/h2oai/h2o-3/blob/master/h2o-py/demos/predict_contributionsShap.ipynb)。
import h2o
import shap
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o import H2OFrame
# initialize H2O
h2o.init()
# load JS visualization code to notebook
shap.initjs()
# Import the prostate dataset
h2o_df = h2o.import_file("https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv")
# Split the data into Train/Test/Validation with Train having 70% and test and validation 15% each
train,test,valid = h2o_df.split_frame(ratios=[.7, .15])
# Convert the response column to a factor
h2o_df["CAPSULE"] = h2o_df["CAPSULE"].asfactor()
# Generate a GBM model using the training dataset
model = H2OGradientBoostingEstimator(distribution="bernoulli",
ntrees=100,
max_depth=4,
learn_rate=0.1)
model.train(y="CAPSULE", x=["AGE","RACE","PSA","GLEASON"],training_frame=h2o_df)
# calculate SHAP values using function predict_contributions
contributions = model.predict_contributions(h2o_df)
# convert the H2O Frame to use with shap's visualization functions
contributions_matrix = contributions.as_data_frame().to_numpy() # the original method is as_matrix()
# shap values are calculated for all features
shap_values = contributions_matrix[:,0:4]
# expected values is the last returned column
expected_value = contributions_matrix[:,4].min()
# force plot for one observation
X=["AGE","RACE","PSA","GLEASON"]
shap.force_plot(expected_value, shap_values[0,:], X)
我从上面的代码中得到的图像是:
force plot for one observation
输出是什么意思?考虑到上面的问题是一个分类问题,预测值应该是一个概率(甚至预测的类别 - 0 或 1),对吧?基值和预测值都是负数
谁能帮我解决这个问题?
你得到的很可能是 log-odds 而不是概率本身。
为了得到一个概率,你需要将每个log-odds转换为概率space,即
p=e^x/(1 + e^x)
当您直接使用 SHAP 时,您可以通过指定 model_output
参数来实现:
shap.TreeExplainer(model, data, model_output='probability')
我一直在尝试计算 Python 中 H2O 模块中梯度提升分类器的 SHAP 值。下面是 predict_contibutions
方法文档中的改编示例(改编自 https://github.com/h2oai/h2o-3/blob/master/h2o-py/demos/predict_contributionsShap.ipynb)。
import h2o
import shap
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o import H2OFrame
# initialize H2O
h2o.init()
# load JS visualization code to notebook
shap.initjs()
# Import the prostate dataset
h2o_df = h2o.import_file("https://raw.github.com/h2oai/h2o/master/smalldata/logreg/prostate.csv")
# Split the data into Train/Test/Validation with Train having 70% and test and validation 15% each
train,test,valid = h2o_df.split_frame(ratios=[.7, .15])
# Convert the response column to a factor
h2o_df["CAPSULE"] = h2o_df["CAPSULE"].asfactor()
# Generate a GBM model using the training dataset
model = H2OGradientBoostingEstimator(distribution="bernoulli",
ntrees=100,
max_depth=4,
learn_rate=0.1)
model.train(y="CAPSULE", x=["AGE","RACE","PSA","GLEASON"],training_frame=h2o_df)
# calculate SHAP values using function predict_contributions
contributions = model.predict_contributions(h2o_df)
# convert the H2O Frame to use with shap's visualization functions
contributions_matrix = contributions.as_data_frame().to_numpy() # the original method is as_matrix()
# shap values are calculated for all features
shap_values = contributions_matrix[:,0:4]
# expected values is the last returned column
expected_value = contributions_matrix[:,4].min()
# force plot for one observation
X=["AGE","RACE","PSA","GLEASON"]
shap.force_plot(expected_value, shap_values[0,:], X)
我从上面的代码中得到的图像是: force plot for one observation
输出是什么意思?考虑到上面的问题是一个分类问题,预测值应该是一个概率(甚至预测的类别 - 0 或 1),对吧?基值和预测值都是负数
谁能帮我解决这个问题?
你得到的很可能是 log-odds 而不是概率本身。 为了得到一个概率,你需要将每个log-odds转换为概率space,即
p=e^x/(1 + e^x)
当您直接使用 SHAP 时,您可以通过指定 model_output
参数来实现:
shap.TreeExplainer(model, data, model_output='probability')