Bokeh image() 绘图有什么问题?它成功但没有显示图表
What is wrong with Bokeh image() plotting? It succeed but showed no graph
我最初有一个 Spark 数据框,其中包含这样的数据:
+-------------------+--------------+------+-----+
|window_time |delayWindowEnd|values|index|
+-------------------+--------------+------+-----+
|2022-01-24 18:00:00|999 |999 |2 |
|2022-01-24 19:00:00|999 |999 |1 |
|2022-01-24 20:00:00|999 |999 |3 |
|2022-01-24 21:00:00|999 |999 |4 |
|2022-01-24 22:00:00|999 |999 |5 |
|2022-01-24 18:00:00|998 |998 |4 |
|2022-01-24 19:00:00|998 |998 |5 |
|2022-01-24 20:00:00|998 |998 |3 |
我想在 Apache Zeppelin 中使用以下代码将其绘制为热图:
%spark.pyspark
import bkzep
import numpy as np
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, ColorBar, LogColorMapper
from bokeh.layouts import gridplot
from pyspark.sql.functions import col, coalesce, lit, monotonically_increasing_id
from pyspark.sql import DataFrame
from pyspark.sql.functions import *
output_notebook(notebook_type='zeppelin')
然后
%pyspark
来自 pyspark.sql.functions 导入 *
def plot_summaries(sensor, dfName):
df = sqlContext.table(dfName)
pdf = df.toPandas()
source = ColumnDataSource(pdf)
color_mapper = LogColorMapper(palette="Viridis256", low=1, high=10)
plot = figure(toolbar_location=None,x_axis_type='datetime')
plot.image(x='window_time', y='delayWindowEnd', source=source, image='index',dw=1,dh=1, color_mapper=color_mapper)
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=12)
plot.add_layout(color_bar, 'right')
show(gridplot([plot], ncols=1, plot_width=1000, plot_height=400))
sensors = [
"all"
]
然后最后
%pyspark
from pyspark.sql.functions import *
keyCol = "month_day_hour"
sensors = [
"all"]
for sensor in sensors:
plot_summaries(sensor, "maxmin2")
最近成功了,但是没看到图
这可能是因为参数误用。
是否可以使用 dataframe 列作为图像参数(而其他两个将是 x 和 y 轴)。 df 和 dw 是否正确初始化? X轴作为时间戳可以吗?
如果是浏览器渲染原因,出现如下JS错误:
polyfills.d42c9551b0788083cd69.js:1 Uncaught Error: Error rendering Bokeh model: could not find #fb19be38-e25a-4ebf-a488-593cd2e9a4d6 HTML tag
at o (bokeh-1.3.4.min.js:31:143801)
at Object.n._resolve_root_elements (bokeh-1.3.4.min.js:31:144274)
at Object.n.embed_items_notebook (bokeh-1.3.4.min.js:31:147281)
at embed_document (<anonymous>:6:20)
at <anonymous>:15:9
at e.invokeTask (polyfills.d42c9551b0788083cd69.js:1:8063)
at t.runTask (polyfills.d42c9551b0788083cd69.js:1:3241)
at t.invokeTask (polyfills.d42c9551b0788083cd69.js:1:9170)
at i.useG.invoke (polyfills.d42c9551b0788083cd69.js:1:9061)
at n.args.<computed> (polyfills.d42c9551b0788083cd69.js:1:38948)
虽然来自 Zeppelin 后端的响应以及执行和绘图结果通过 websocket 应用程序到达浏览器,看起来非常正确:
这里给出了答案:https://discourse.bokeh.org/t/cant-render-heatmap-data-for-apache-zeppelins-pyspark-dataframe/8844
错误的解释在上面link有详细解释。很快,我就没有将所需的 2D 数组应用于 Bokeh,我不得不使用 pandas
' pivot
和 numpy
来生成它。这是解决方案:
dft = sqlContext.table(dfName)
pdf = dft.toPandas()
import pandas as pd
rowIDs = pdf['values']
colIDs = pdf['window_time']
A = pdf.pivot_table('index', 'values', 'window_time', fill_value=0)
source = ColumnDataSource(data={'x':[pd.to_datetime('Jan 24 2022')] #left most
,'y':[0] #bottom most
,'dw':[pdf['window_time'].max()-pdf['window_time'].min()] #TOTAL width of image
#,'dh':[df['delayWindowEnd'].max()] #TOTAL height of image
,'dh':[1000] #TOTAL height of image
,'im':[A.to_numpy()] #2D array using to_numpy() method on pivotted df
})
color_mapper = LogColorMapper(palette="Viridis256", low=1, high=20)
plot = figure(toolbar_location=None,x_axis_type='datetime')
plot.image(x='x', y='y', source=source, image='im',dw='dw',dh='dh', color_mapper=color_mapper)
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=12)
plot.add_layout(color_bar, 'right')
show(gridplot([plot], ncols=1, plot_width=1000, plot_height=400))
结果对我来说很不错:
我最初有一个 Spark 数据框,其中包含这样的数据:
+-------------------+--------------+------+-----+
|window_time |delayWindowEnd|values|index|
+-------------------+--------------+------+-----+
|2022-01-24 18:00:00|999 |999 |2 |
|2022-01-24 19:00:00|999 |999 |1 |
|2022-01-24 20:00:00|999 |999 |3 |
|2022-01-24 21:00:00|999 |999 |4 |
|2022-01-24 22:00:00|999 |999 |5 |
|2022-01-24 18:00:00|998 |998 |4 |
|2022-01-24 19:00:00|998 |998 |5 |
|2022-01-24 20:00:00|998 |998 |3 |
我想在 Apache Zeppelin 中使用以下代码将其绘制为热图:
%spark.pyspark
import bkzep
import numpy as np
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, ColorBar, LogColorMapper
from bokeh.layouts import gridplot
from pyspark.sql.functions import col, coalesce, lit, monotonically_increasing_id
from pyspark.sql import DataFrame
from pyspark.sql.functions import *
output_notebook(notebook_type='zeppelin')
然后
%pyspark
来自 pyspark.sql.functions 导入 *
def plot_summaries(sensor, dfName):
df = sqlContext.table(dfName)
pdf = df.toPandas()
source = ColumnDataSource(pdf)
color_mapper = LogColorMapper(palette="Viridis256", low=1, high=10)
plot = figure(toolbar_location=None,x_axis_type='datetime')
plot.image(x='window_time', y='delayWindowEnd', source=source, image='index',dw=1,dh=1, color_mapper=color_mapper)
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=12)
plot.add_layout(color_bar, 'right')
show(gridplot([plot], ncols=1, plot_width=1000, plot_height=400))
sensors = [
"all"
]
然后最后
%pyspark
from pyspark.sql.functions import *
keyCol = "month_day_hour"
sensors = [
"all"]
for sensor in sensors:
plot_summaries(sensor, "maxmin2")
最近成功了,但是没看到图
这可能是因为参数误用。
是否可以使用 dataframe 列作为图像参数(而其他两个将是 x 和 y 轴)。 df 和 dw 是否正确初始化? X轴作为时间戳可以吗?
如果是浏览器渲染原因,出现如下JS错误:
polyfills.d42c9551b0788083cd69.js:1 Uncaught Error: Error rendering Bokeh model: could not find #fb19be38-e25a-4ebf-a488-593cd2e9a4d6 HTML tag
at o (bokeh-1.3.4.min.js:31:143801)
at Object.n._resolve_root_elements (bokeh-1.3.4.min.js:31:144274)
at Object.n.embed_items_notebook (bokeh-1.3.4.min.js:31:147281)
at embed_document (<anonymous>:6:20)
at <anonymous>:15:9
at e.invokeTask (polyfills.d42c9551b0788083cd69.js:1:8063)
at t.runTask (polyfills.d42c9551b0788083cd69.js:1:3241)
at t.invokeTask (polyfills.d42c9551b0788083cd69.js:1:9170)
at i.useG.invoke (polyfills.d42c9551b0788083cd69.js:1:9061)
at n.args.<computed> (polyfills.d42c9551b0788083cd69.js:1:38948)
虽然来自 Zeppelin 后端的响应以及执行和绘图结果通过 websocket 应用程序到达浏览器,看起来非常正确:
这里给出了答案:https://discourse.bokeh.org/t/cant-render-heatmap-data-for-apache-zeppelins-pyspark-dataframe/8844
错误的解释在上面link有详细解释。很快,我就没有将所需的 2D 数组应用于 Bokeh,我不得不使用 pandas
' pivot
和 numpy
来生成它。这是解决方案:
dft = sqlContext.table(dfName)
pdf = dft.toPandas()
import pandas as pd
rowIDs = pdf['values']
colIDs = pdf['window_time']
A = pdf.pivot_table('index', 'values', 'window_time', fill_value=0)
source = ColumnDataSource(data={'x':[pd.to_datetime('Jan 24 2022')] #left most
,'y':[0] #bottom most
,'dw':[pdf['window_time'].max()-pdf['window_time'].min()] #TOTAL width of image
#,'dh':[df['delayWindowEnd'].max()] #TOTAL height of image
,'dh':[1000] #TOTAL height of image
,'im':[A.to_numpy()] #2D array using to_numpy() method on pivotted df
})
color_mapper = LogColorMapper(palette="Viridis256", low=1, high=20)
plot = figure(toolbar_location=None,x_axis_type='datetime')
plot.image(x='x', y='y', source=source, image='im',dw='dw',dh='dh', color_mapper=color_mapper)
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=12)
plot.add_layout(color_bar, 'right')
show(gridplot([plot], ncols=1, plot_width=1000, plot_height=400))
结果对我来说很不错: