for 循环绘制 python 散景中的前 n 个特征重要性，而无需显式键入列名

Question

我想在散景 RandomForestClassifier() 中绘制前 n 个特征，而不在 y 中明确指定列名变量。

因此，首先，与其在变量 y 中输入列名，不如直接从随机分类器的顶级特征中获取列名和值。

y = df['new']
x = df.drop('new', axis=1)
rf = RandomForestClassifier()
rf.fit(x,y)

#Extract the top feature from above and plot in bokeh

source = ColumnDataSource(df)

p1 = figure(y_range=(0, 10))

# below I would like it to use the top feature in RandomClassifier 
# instead of explicitly writing the column name, horsePower,
# from the top features column

p1.line(
    x = 'x',
    y = 'horsePower', 
    source=source,
    legend = 'Car Blue',
    color = 'Blue'
 )

我们可以构建一个 for 循环来绘制 n 顶级特征，而不是仅指定第一个特征或第二个特征在散景中。我想它应该接近这个

for i in range(5):
    p.line(x = 'x', y = ???? , source=source,) #top feature in randomClassifier
    p.circle(x = 'x', y = ???? , source=source, size = 10)
    row = [p]

output_file('TopFeatures')
show(p)

我已经从模型的 RandomForestClassifier 中提取了前 15 个特征，并使用

打印了前 15 个特征

 new_rf = pd.Series(rf.feature_importances_,index=x.columns).sort_values(ascending=False) 

print(new_rf[:15])

Answer 1

简单地遍历pandas系列的索引值，new_rf，因为它的索引是列名：

# TOP 1 FEATURE
p1.line(
    x = 'x',
    y = new_rf.index[0], 
    source = source,
    legend = 'Car Blue',
    color = 'Blue'
 )

# TOP 5 FEATURES
for i in new_rf[:5].index:

    output_file("TopFeatures_{}".format(i))

    p = figure(y_range=(0, 10))
    p.line(x = 'x', y = i, source = source)
    p.circle(x = 'x', y = i, source = source, size = 10)

    show(p)

for 循环绘制 python 散景中的前 n 个特征重要性，而无需显式键入列名

for loop to plot the top n features importance in bokeh in python without explicitly typing the column names

python

plot

pandas

bokeh