在 python sklearn 部分依赖图中更改 x 标签
change x labels in a python sklearn partial dependence plot
Hi 使用归一化数据拟合 GradientBoostingRegressor 并绘制主要 10 个变量的部分依赖关系。现在我想根据真实的非标准化值绘制它们,因此想访问 x 标签。我该怎么做?
我的代码相当于
http://scikit-learn.org/stable/auto_examples/ensemble/plot_partial_dependence.html
对于 3D 图来说很简单,因为我可以转换坐标轴
axes[0] = (axes[0]*mysd0)+mymean0
axes[1] = (axes[1]*mysd1)+mymean1
具有均值和标准差,但对于子图,我不知道如何访问标签。谢谢
这是我正在谈论的代码部分:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble.partial_dependence import plot_partial_dependence
from sklearn.datasets.california_housing import fetch_california_housing
cal_housing = fetch_california_housing()
# split 80/20 train-test
X_train, X_test, y_train, y_test = train_test_split(cal_housing.data,
cal_housing.target,
test_size=0.2,
random_state=1)
names = cal_housing.feature_names
clf = GradientBoostingRegressor(n_estimators=100, max_depth=4,
learning_rate=0.1, loss='huber',
random_state=1)
clf.fit(X_train, y_train)
features = [0, 5, 1]
fig, axs = plot_partial_dependence(clf, X_train, features,
feature_names=names,
n_jobs=3, grid_resolution=50)
fig.suptitle('Partial dependence of house value on nonlocation features\n'
'for the California housing dataset')
在此图中,我想访问和操作 x 轴标签...
如果我没理解错的话你想根据特征重要性访问标签。
如果是这种情况,您可以执行以下操作:
#after fitting the model use this to get the feature importance
feature_importance = clf.feature_importances_
# make importances relative to max importance
feature_importance = 100.0 * (feature_importance / feature_importance.max())
# sort the importances and get the indices of the sorting
sorted_idx = np.argsort(feature_importance)
#match the indices with the labels of the x matrix
#important: x must have columns names to do this
x.columns[feature_names[sorted_idx]]
这将为您提供升序排列的特征名称。这意味着名字是最不重要的特征,而姓氏是最重要的特征。
我找到了解决方案,而且很明显... axs 包含所有轴信息作为列表。因此每个轴都可以被它访问。因此,第一个子图的轴是 axs[0] 并获得标签:
labels = [item.get_text() for item in axs[0].get_xticklabels()]
然而,这在我的例子中不起作用,尽管图中显示了值,但标签总是空的。因此,我使用轴限制和以下代码来创建新的转换标签
fig, axs = plot_partial_dependence(clf, X, features,feature_names=X.columns, grid_resolution=100)
lims = plt.getp(axs[0],"xlim")
myxrange = np.linspace(lims[0],lims[1],5)
mymean = mean4bactransform
mysd = sd4bactransform
newlabels = [str(round((myx*mysd)+mymean,2)) for myx in myxrange]
plt.setp(axs, xticks=myxrange, xticklabels=newlabels)
fig.suptitle('Partial dependence')
plt.subplots_adjust(top=0.9) # tight_layout causes overlap with suptitle
fig.set_size_inches(10.5, 7.5)
Hi 使用归一化数据拟合 GradientBoostingRegressor 并绘制主要 10 个变量的部分依赖关系。现在我想根据真实的非标准化值绘制它们,因此想访问 x 标签。我该怎么做?
我的代码相当于 http://scikit-learn.org/stable/auto_examples/ensemble/plot_partial_dependence.html
对于 3D 图来说很简单,因为我可以转换坐标轴
axes[0] = (axes[0]*mysd0)+mymean0
axes[1] = (axes[1]*mysd1)+mymean1
具有均值和标准差,但对于子图,我不知道如何访问标签。谢谢
这是我正在谈论的代码部分:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble.partial_dependence import plot_partial_dependence
from sklearn.datasets.california_housing import fetch_california_housing
cal_housing = fetch_california_housing()
# split 80/20 train-test
X_train, X_test, y_train, y_test = train_test_split(cal_housing.data,
cal_housing.target,
test_size=0.2,
random_state=1)
names = cal_housing.feature_names
clf = GradientBoostingRegressor(n_estimators=100, max_depth=4,
learning_rate=0.1, loss='huber',
random_state=1)
clf.fit(X_train, y_train)
features = [0, 5, 1]
fig, axs = plot_partial_dependence(clf, X_train, features,
feature_names=names,
n_jobs=3, grid_resolution=50)
fig.suptitle('Partial dependence of house value on nonlocation features\n'
'for the California housing dataset')
在此图中,我想访问和操作 x 轴标签...
如果我没理解错的话你想根据特征重要性访问标签。
如果是这种情况,您可以执行以下操作:
#after fitting the model use this to get the feature importance
feature_importance = clf.feature_importances_
# make importances relative to max importance
feature_importance = 100.0 * (feature_importance / feature_importance.max())
# sort the importances and get the indices of the sorting
sorted_idx = np.argsort(feature_importance)
#match the indices with the labels of the x matrix
#important: x must have columns names to do this
x.columns[feature_names[sorted_idx]]
这将为您提供升序排列的特征名称。这意味着名字是最不重要的特征,而姓氏是最重要的特征。
我找到了解决方案,而且很明显... axs 包含所有轴信息作为列表。因此每个轴都可以被它访问。因此,第一个子图的轴是 axs[0] 并获得标签:
labels = [item.get_text() for item in axs[0].get_xticklabels()]
然而,这在我的例子中不起作用,尽管图中显示了值,但标签总是空的。因此,我使用轴限制和以下代码来创建新的转换标签
fig, axs = plot_partial_dependence(clf, X, features,feature_names=X.columns, grid_resolution=100)
lims = plt.getp(axs[0],"xlim")
myxrange = np.linspace(lims[0],lims[1],5)
mymean = mean4bactransform
mysd = sd4bactransform
newlabels = [str(round((myx*mysd)+mymean,2)) for myx in myxrange]
plt.setp(axs, xticks=myxrange, xticklabels=newlabels)
fig.suptitle('Partial dependence')
plt.subplots_adjust(top=0.9) # tight_layout causes overlap with suptitle
fig.set_size_inches(10.5, 7.5)