在 xgboost 上使用 shap 时出现 UnicodeDecodeError

Question

我正在尝试在 xgboost 模型上使用 shap，但出现错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 341: invalid start byte

示例：

model = XGBClassifier()
model.fit(X_train, y_train)
explainer = shap.TreeExplainer(model)

包版本：

    python == 3.6.9
    xgboost==1.1.0
    shap==0.35.0

问题是什么，我们该如何解决？

Answer 1

系统中似乎存在错误。参见：https://github.com/slundberg/shap/issues/1215。该问题似乎已解决，但可能尚未发布修复程序。无论如何，我遇到了同样的问题并通过安装 xgboost v1.0.0 暂时解决了它。

Answer 2

我尝试了以下解决方案并且有效。

package versions:
    python == 3.7.7
    xgboost==1.1.1
    shap==0.35.0

代码对我来说效果很好。

import shap
from xgboost.sklearn import XGBClassifier

xgb = XGBClassifier(random_state=42)
mymodel = xgb.fit(X_train, y_train)

The part that really solves them is this, must not miss

mybooster = mymodel.get_booster()    
model_bytearray = mybooster.save_raw()[4:]
def myfun(self=None):
    return model_bytearray
mybooster.save_raw = myfun

# Shap explainer initilization
shap_ex = shap.TreeExplainer(mybooster)

Answer 3

我遇到了与 xgboost-1.2.0 和 shap 0.35.0 相同的问题。

这是我能够运行没有问题的完整示例：

import numpy as np
import xgboost as xgb
import shap

# data
np.random.seed(100)
X_train = np.random.random((100, 10))
y_train = np.random.randint(2, size=100)

# model
model = xgb.XGBClassifier(random_state=42)
fitted_model = model.fit(X_train, y_train)

# monkey patch
booster = fitted_model.get_booster() 
model_bytearray = booster.save_raw()[4:]
booster.save_raw = lambda : model_bytearray

# shap expaliner
explainer = shap.TreeExplainer(booster)
shap_values = explainer.shap_values(X_train)
shap.summary_plot(shap_values, X_train)

输出

Answer 4

!pip install shap==0.36.0
!pip install xgboost==1.3.3

这对我很有效

在 xgboost 上使用 shap 时出现 UnicodeDecodeError

Getting UnicodeDecodeError when using shap on xgboost

python

python-3.x

xgboost

输出