如何全局访问套索选择的数据?
How to globally access data selected by lasso?
为了说明我的问题,我改编了 here 中的示例。
假设我有几个不同维度的数据集,我想将它们绘制在同一个图中然后进行选择,然后我将如何将所选数据存储在例如字典(或任何其他结构,如 numpy 数组,pandas 数据框,...)?所以数据选择可能看起来像这样
现在有没有办法以以下形式存储所选数据:
{'x': ['x_1_selected', 'x_2_selected', '...', 'x_n_selected'],
'x1': ['x1_1_selected', 'x1_2_selected', '...', 'x1_n_selected'],
'y': ['y_1_selected', 'y_2_selected', '...', 'y_n_selected'],
'y1': ['y1_1_selected', 'y1_2_selected', '...', 'y1_n_selected']}
?
提供一些背景知识:
应该允许用户加载不同的数据集,将它们可视化,使用套索进行选择,然后选择的数据应该用于其他函数,所以它们需要存储在一个结构中,可以很容易地传递给我定义的其他函数.
有没有,例如一种全局更新 s2
和 s2b
的方法(见下面的代码)?当我现在检查 s2.data
和 s2b.data
时,它们仍然显示空列表。
这是代码:
from random import random
from bokeh.layouts import row
from bokeh.models import CustomJS, ColumnDataSource
from bokeh.plotting import figure, output_file, show
output_file("callback.html")
x = [random() for x in range(500)]
y = [random() for y in range(500)]
x2 = [random() for x2 in range(100)]
y2 = [random() for y2 in range(100)]
# the two different data sources of different length
s1 = ColumnDataSource(data=dict(x=x, y=y))
s1b = ColumnDataSource(data=dict(x2=x2, y2=y2))
# the figure with all source data where we make selections
p1 = figure(plot_width=400, plot_height=400, tools="lasso_select", title="Select Here")
p1.circle('x', 'y', source=s1, alpha=0.6, color='red')
p1.circle('x2', 'y2', source=s1b, alpha=0.6, color='black')
# second figure which is empty initially where we show the selected datapoints
s2 = ColumnDataSource(data=dict(x=[], y=[]))
s2b = ColumnDataSource(data=dict(x2=[], y2=[]))
p2 = figure(plot_width=400, plot_height=400, x_range=(0, 1), y_range=(0, 1),
tools="", title="Watch Here")
p2.circle('x', 'y', source=s2, alpha=0.6, color='red')
p2.circle('x2', 'y2', source=s2b, alpha=0.6, color='black')
# individual callback for different datasets
s1.callback = CustomJS(args=dict(s2=s2), code="""
var inds = cb_obj.selected['1d'].indices;
var d1 = cb_obj.data;
var d2 = s2.data;
d2['x'] = []
d2['y'] = []
for (i = 0; i < inds.length; i++) {
d2['x'].push(d1['x'][inds[i]])
d2['y'].push(d1['y'][inds[i]])
}
s2.change.emit();
""")
s1b.callback = CustomJS(args=dict(s2b=s2b), code="""
var inds = cb_obj.selected['1d'].indices;
var d1 = cb_obj.data;
var d2 = s2b.data;
d2['x2'] = []
d2['y2'] = []
for (i = 0; i < inds.length; i++) {
d2['x2'].push(d1['x2'][inds[i]])
d2['y2'].push(d1['y2'][inds[i]])
}
s2b.change.emit();
""")
layout = row(p1, p2)
show(layout)
鉴于您提到将所选数据存储在 numpy 数组或 pandas 数据框中,我假设您打算实际使用 bokeh serve
。在这种情况下,您无需编写任何 JS 代码,因为所有 Bokeh DataSource
都具有 selected
属性,您可以为其附加一个 Python 回调。只需将您的回调代码替换为:
def attach_selection_callback(main_ds, selection_ds):
def cb(attr, old, new):
new_data = {c: [] for c in main_ds.data}
for idx in new['1d']['indices']:
for column, values in main_ds.data.items():
new_data[column].append(values[idx])
# Setting at the very end to make sure that we don't trigger multiple events
selection_ds.data = new_data
main_ds.on_change('selected', cb)
attach_selection_callback(s1, s2)
attach_selection_callback(s1b, s2b)
为了说明我的问题,我改编了 here 中的示例。
假设我有几个不同维度的数据集,我想将它们绘制在同一个图中然后进行选择,然后我将如何将所选数据存储在例如字典(或任何其他结构,如 numpy 数组,pandas 数据框,...)?所以数据选择可能看起来像这样
现在有没有办法以以下形式存储所选数据:
{'x': ['x_1_selected', 'x_2_selected', '...', 'x_n_selected'],
'x1': ['x1_1_selected', 'x1_2_selected', '...', 'x1_n_selected'],
'y': ['y_1_selected', 'y_2_selected', '...', 'y_n_selected'],
'y1': ['y1_1_selected', 'y1_2_selected', '...', 'y1_n_selected']}
?
提供一些背景知识: 应该允许用户加载不同的数据集,将它们可视化,使用套索进行选择,然后选择的数据应该用于其他函数,所以它们需要存储在一个结构中,可以很容易地传递给我定义的其他函数.
有没有,例如一种全局更新 s2
和 s2b
的方法(见下面的代码)?当我现在检查 s2.data
和 s2b.data
时,它们仍然显示空列表。
这是代码:
from random import random
from bokeh.layouts import row
from bokeh.models import CustomJS, ColumnDataSource
from bokeh.plotting import figure, output_file, show
output_file("callback.html")
x = [random() for x in range(500)]
y = [random() for y in range(500)]
x2 = [random() for x2 in range(100)]
y2 = [random() for y2 in range(100)]
# the two different data sources of different length
s1 = ColumnDataSource(data=dict(x=x, y=y))
s1b = ColumnDataSource(data=dict(x2=x2, y2=y2))
# the figure with all source data where we make selections
p1 = figure(plot_width=400, plot_height=400, tools="lasso_select", title="Select Here")
p1.circle('x', 'y', source=s1, alpha=0.6, color='red')
p1.circle('x2', 'y2', source=s1b, alpha=0.6, color='black')
# second figure which is empty initially where we show the selected datapoints
s2 = ColumnDataSource(data=dict(x=[], y=[]))
s2b = ColumnDataSource(data=dict(x2=[], y2=[]))
p2 = figure(plot_width=400, plot_height=400, x_range=(0, 1), y_range=(0, 1),
tools="", title="Watch Here")
p2.circle('x', 'y', source=s2, alpha=0.6, color='red')
p2.circle('x2', 'y2', source=s2b, alpha=0.6, color='black')
# individual callback for different datasets
s1.callback = CustomJS(args=dict(s2=s2), code="""
var inds = cb_obj.selected['1d'].indices;
var d1 = cb_obj.data;
var d2 = s2.data;
d2['x'] = []
d2['y'] = []
for (i = 0; i < inds.length; i++) {
d2['x'].push(d1['x'][inds[i]])
d2['y'].push(d1['y'][inds[i]])
}
s2.change.emit();
""")
s1b.callback = CustomJS(args=dict(s2b=s2b), code="""
var inds = cb_obj.selected['1d'].indices;
var d1 = cb_obj.data;
var d2 = s2b.data;
d2['x2'] = []
d2['y2'] = []
for (i = 0; i < inds.length; i++) {
d2['x2'].push(d1['x2'][inds[i]])
d2['y2'].push(d1['y2'][inds[i]])
}
s2b.change.emit();
""")
layout = row(p1, p2)
show(layout)
鉴于您提到将所选数据存储在 numpy 数组或 pandas 数据框中,我假设您打算实际使用 bokeh serve
。在这种情况下,您无需编写任何 JS 代码,因为所有 Bokeh DataSource
都具有 selected
属性,您可以为其附加一个 Python 回调。只需将您的回调代码替换为:
def attach_selection_callback(main_ds, selection_ds):
def cb(attr, old, new):
new_data = {c: [] for c in main_ds.data}
for idx in new['1d']['indices']:
for column, values in main_ds.data.items():
new_data[column].append(values[idx])
# Setting at the very end to make sure that we don't trigger multiple events
selection_ds.data = new_data
main_ds.on_change('selected', cb)
attach_selection_callback(s1, s2)
attach_selection_callback(s1b, s2b)