来自 pandas 交叉表的 Plotly 气泡图

Question

如何从另一个数据框的 pandas 交叉表创建的数据框中绘制气泡图？

进口；

import plotly as py
import plotly.graph_objects as go
from plotly.subplots import make_subplots

交叉表是使用以下方法创建的；

df = pd.crosstab(raw_data['Speed'], raw_data['Height'].fillna('n/a'))

df 主要包含零，但是在数字出现的地方我想要一个点，其中值控制点的大小。我想将索引值设置为 x 轴，将列名称值设置为 Y 轴。

df 看起来像；

         10    20    30    40    50
1000     0     0    0      0     5
1100     0     0    0      7     0
1200     1     0    3      0     0
1300     0     0    0      0     0
1400     5     0    0      0     0

我试过像这样使用散点图和散点图；

fig.add_trace(go.Scatter(x=df.index.values, y=df.columns.values, size=df.values,
                         mode='lines'),
              row=1, col=3)

这返回了类型错误：'Module' 对象不可调用。

非常感谢任何帮助。谢谢

更新

下面的答案接近于我最终得到的答案，主要区别在于我在熔化线中引用了 'Speed'；

df.reset_index()
df.melt(id_vars="Speed")
df.rename(columns={"index":"Engine Speed",
                    "variable":"Height",
                    "value":"Count"})
df[df!=0].dropna()

scale=1000

fig.add_trace(go.Scatter(x=df["Speed"], y=df["Height"],mode='markers',marker_size=df["Count"]/scale),
              row=1, col=3)

这是有效的，但我现在的主要问题是数据集很大，plotly 很难处理它。

更新 2

使用Scattergl可以让Plotly很好地处理大数据集！

Answer 1

我建议使用 tidy format 来表示您的数据。当且仅当

时，我们说数据框是整洁的

每一行都是一个观察结果
每一列都是一个变量
每个值必须有自己的单元格

要创建更整洁的数据框，您可以这样做

df = pd.crosstab(raw_data["Speed"], raw_data["Height"])
df.reset_index(level=0, inplace=True)
df.melt(id_vars=["Speed", "Height"], value_vars=["Counts"])

   Speed  Height  Counts
0   1000      10       2
1   1100      20       1
2   1200      10       1
3   1200      30       1
4   1300      40       1
5   1400      50       1

下一步是进行实际绘图。

# when scale is increased bubbles will become larger
scale = 10 
# create the scatter plot
scatter = go.Scatter(
    x=df.Speed, 
    y=df.Height,
    marker_size=df.counts*scale,
    mode='markers')
fig = go.Figure(scatter)
fig.show()

这将创建如下图所示的图。

Answer 2

如果是这种情况，您可以使用 plotly.express 这与 @Erik 的回答非常相似，但是 不应该 return 错误。

import pandas as pd
import plotly.express as px
from io import StringIO

txt = """
        10    20    30    40    50
1000     0     0    0      0     5
1100     0     0    0      7     0
1200     1     0    3      0     0
1300     0     0    0      0     0
1400     5     0    0      0     0
"""

df = pd.read_csv(StringIO(txt), delim_whitespace=True)

df = df.reset_index()\
       .melt(id_vars="index")\
       .rename(columns={"index":"Speed",
                        "variable":"Height",
                        "value":"Count"})

fig = px.scatter(df, x="Speed", y="Height",size="Count")
fig.show()

更新如果您遇到错误，请用 pd.__version__ 检查您的 pandas version 并尝试逐行检查此

df = pd.read_csv(StringIO(txt), delim_whitespace=True)

df = df.reset_index()

df = df.melt(id_vars="index")

df = df.rename(columns={"index":"Speed",
                        "variable":"Height",
                        "value":"Count"})

并报告它在哪一行中断。

来自 pandas 交叉表的 Plotly 气泡图

Plotly Bubble chart from pandas crosstab

python

crosstab

dataframe

plotly