在 Plotly 中拆分数据框并使用不同的线条样式进行绘图

Splitting a dataframe and plotting with different line styles in Plotly

我是 plotly 和 pandas 的新手,我正在尝试找到一个优雅的解决方案,因为我相信我要么没有在 plotly 中有效地使用 groupby,要么我的数据以某种方式堆叠起来,这是阻止我想象它。

为了制作测试图表,我使用了一个假数据集,将 3 个列表(组、月、支出)压缩在一起,并在特定月份(3 月 20 日)后将其拆分为“实际”和“预测”值.

当我尝试在几个月内添加包含 3 个不同组的预测 df 的跟踪时,我得到了下面的怪物。

当我将索引更改为组,然后使用 loc 将子集分成 3 个独立的集合(每个组一个)时,我成功地制作了以下图表,尽管它感觉像是一个科学怪人的解决方案:

我想知道是否有办法绘制初始数据框的图表并在 x 轴上的某个点之后更改线条样式,如果没有,是否有办法在子集上使用跟踪包含三个不同组(group1、group2、group3)的数据?我不确定使用三个单独的跟踪并一遍又一遍地拆分数据是否是最好的解决方案,我相信有一个更有效的解决方案。

以下是我目前如何获取单独的组:

# reset index 
forecast = forecast.set_index(['group'])

#split
group1_forecast =forecast.loc['group1']
group2_forecast = forecast.loc['group2']
group3_forecast = forecast.loc['group3']

这是带有单独轨迹的图表的(最小)代码:

fig = None

fig = px.line(actual, 
            x="month", y="spend", color='group',
            title=title)

# group1 
fig.add_scatter(
    x= group1_forecast.month,
    y = group1_forecast.spend,
    mode = 'lines',
    line = dict(shape = 'linear', color = 'purple', width = 1, dash = 'dot'),
    connectgaps = True
)

# group2 trace 
fig.add_scatter(
    x= group2_forecast.month,
    y = group2_forecast.spend,
    mode = 'lines',
    line = dict(shape = 'linear', color = '#33C1FF', width = 1, dash = 'dot'),
    connectgaps = True
)

# group3 trace
fig.add_scatter(
    x= group3_forecast.month,
    y = group3_forecast.spend,
    mode = 'lines',
    line = dict(shape = 'linear', color = '#FFDD33', width = 1, dash = 'dot'),
    connectgaps = True
)

fig.show()

这是数据:

months = ["Mar '19", "Mar '19", "Mar '19", 
          "Apr '19", "Apr '19", "Apr '19", 
          "May '19", "May '19", "May '19", 
          "Jun '19", "Jun '19", "Jun '19", 
          "Jul '19", "Jul '19", "Jul '19", 
          "Aug '19", "Aug '19", "Aug '19", 
          "Sep '19", "Sep '19", "Sep '19", 
          "Oct '19", "Oct '19", "Oct '19", 
          "Nov '19", "Nov '19", "Nov '19", 
          "Dec '19", "Dec '19", "Dec '19", 
          "Jan '20", "Jan '20", "Jan '20", 
          "Feb '20", "Feb '20", "Feb '20", 
          "Mar '20", "Mar '20", "Mar '20", 
          "Apr '20", "Apr '20", "Apr '20", 
          "May '20", "May '20", "May '20", 
          "Jun '20", "Jun '20", "Jun '20", 
          "Jul '20", "Jul '20", "Jul '20", 
          "Aug '20", "Aug '20", "Aug '20", 
          "Sep '20", "Sep '20", "Sep '20"]

groups = ['group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3',
           'group1', 'group2', 'group3']

spend = [57, 150, 75, 
        61.5, 156, 78, 
        66, 150, 75, 
        63, 162, 81, 
        69, 163.5, 81.75,
        76.5, 162, 81, 
        78, 168, 84,
        79.5, 168, 84, 
        84, 162, 81, 
        87, 169.5, 84.75, 
        93, 171, 85.5, 
        96, 169.5, 84.75, 
        97.5, 168, 84,
        97.9, 167.7, 84.5,
        98.4, 167.9, 85.1,
        99.9, 168.1, 85.7,
        100.9, 168, 86.1,
        101.6, 168.4, 86.3,
        102.7, 168.8, 86.9]

spend_by_group_list = list(zip(months, groups, spend))

spend_df = pd.DataFrame(spend_by_group_list, columns = ['month', 'group', 'spend'])

创建 spend_df 后,我重新实现了您的数据处理步骤。我不是 100% 确定问题的根本原因是什么,因为您没有提供重现该问题的确切代码。但是,如果您像这样拆分组应该没有问题:spend_df[spend_df["group"] == "groupN"]。应保留月份的顺序。

# use spend_df created by your code

# split the different groups
split_month = 13
ls_actual = []  # by group
ls_forecast = []  # by group
for i in range(3):
    df = spend_df[spend_df["group"] == f"group{i+1}"]
    ls_actual.append(df[:split_month])
    ls_forecast.append(df[split_month:])

actual = pd.concat(ls_actual, axis=0)  # stack vertically

# plot
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "browser"

# actual
ls_colors = ['purple', '#33C1FF', '#FFDD33']
fig = px.line(
    actual, x="month", y="spend", color='group',
    color_discrete_map={f"group{i+1}": ls_colors[i] for i in range(3)},
    title="title"
)

# forecast
for i in range(3):
    fig.add_scatter(
        x=ls_forecast[i].month,
        y=ls_forecast[i].spend,
        mode='lines',
        line=dict(shape='linear', color=ls_colors[i], width=1, dash='dot'),
        connectgaps=True
    )

fig.show()

结果: