如何全局访问套索选择的数据?

How to globally access data selected by lasso?

为了说明我的问题,我改编了 here 中的示例。

假设我有几个不同维度的数据集,我想将它们绘制在同一个图中然后进行选择,然后我将如何将所选数据存储在例如字典(或任何其他结构,如 numpy 数组,pandas 数据框,...)?所以数据选择可能看起来像这样

现在有没有办法以以下形式存储所选数据:

{'x': ['x_1_selected', 'x_2_selected', '...', 'x_n_selected'],
 'x1': ['x1_1_selected', 'x1_2_selected', '...', 'x1_n_selected'],
 'y': ['y_1_selected', 'y_2_selected', '...', 'y_n_selected'],
 'y1': ['y1_1_selected', 'y1_2_selected', '...', 'y1_n_selected']}

?

提供一些背景知识: 应该允许用户加载不同的数据集,将它们可视化,使用套索进行选择,然后选择的数据应该用于其他函数,所以它们需要存储在一个结构中,可以很容易地传递给我定义的其他函数.

有没有,例如一种全局更新 s2s2b 的方法(见下面的代码)?当我现在检查 s2.datas2b.data 时,它们仍然显示空列表。

这是代码:

from random import random

from bokeh.layouts import row
from bokeh.models import CustomJS, ColumnDataSource
from bokeh.plotting import figure, output_file, show

output_file("callback.html")

x = [random() for x in range(500)]
y = [random() for y in range(500)]
x2 = [random() for x2 in range(100)]
y2 = [random() for y2 in range(100)]

# the two different data sources of different length
s1 = ColumnDataSource(data=dict(x=x, y=y))
s1b = ColumnDataSource(data=dict(x2=x2, y2=y2))

# the figure with all source data where we make selections
p1 = figure(plot_width=400, plot_height=400, tools="lasso_select", title="Select Here")
p1.circle('x', 'y', source=s1, alpha=0.6, color='red')
p1.circle('x2', 'y2', source=s1b, alpha=0.6, color='black')

# second figure which is empty initially where we show the selected datapoints
s2 = ColumnDataSource(data=dict(x=[], y=[]))
s2b = ColumnDataSource(data=dict(x2=[], y2=[]))
p2 = figure(plot_width=400, plot_height=400, x_range=(0, 1), y_range=(0, 1),
            tools="", title="Watch Here")
p2.circle('x', 'y', source=s2, alpha=0.6, color='red')
p2.circle('x2', 'y2', source=s2b, alpha=0.6, color='black')

# individual callback for different datasets
s1.callback = CustomJS(args=dict(s2=s2), code="""
        var inds = cb_obj.selected['1d'].indices;
        var d1 = cb_obj.data;
        var d2 = s2.data;
        d2['x'] = []
        d2['y'] = []
        for (i = 0; i < inds.length; i++) {
            d2['x'].push(d1['x'][inds[i]])
            d2['y'].push(d1['y'][inds[i]])
        }
        s2.change.emit();
    """)

s1b.callback = CustomJS(args=dict(s2b=s2b), code="""
        var inds = cb_obj.selected['1d'].indices;
        var d1 = cb_obj.data;
        var d2 = s2b.data;
        d2['x2'] = []
        d2['y2'] = []
        for (i = 0; i < inds.length; i++) {
            d2['x2'].push(d1['x2'][inds[i]])
            d2['y2'].push(d1['y2'][inds[i]])
        }
        s2b.change.emit();
    """)

layout = row(p1, p2)

show(layout)

鉴于您提到将所选数据存储在 numpy 数组或 pandas 数据框中,我假设您打算实际使用 bokeh serve。在这种情况下,您无需编写任何 JS 代码,因为所有 Bokeh DataSource 都具有 selected 属性,您可以为其附加一个 Python 回调。只需将您的回调代码替换为:

def attach_selection_callback(main_ds, selection_ds):
    def cb(attr, old, new):
        new_data = {c: [] for c in main_ds.data}
        for idx in new['1d']['indices']:
            for column, values in main_ds.data.items():
                new_data[column].append(values[idx])
        # Setting at the very end to make sure that we don't trigger multiple events
        selection_ds.data = new_data

    main_ds.on_change('selected', cb)


attach_selection_callback(s1, s2)
attach_selection_callback(s1b, s2b)