Pandas 对每个月进行分组和求和

Question

我正在做一个 Plotly Dash 应用程序，其中有一个每年 select 的下拉列表。在 pandas 中，我想按年份对列值进行分组，然后对那一年的每一列求和。这样做，我将能够在下拉列表中选择一个月，然后将过滤该月的数据以更新 Sankey 图。我该怎么做？列：['Month']； ['Value1'] ; ['Value2'] ...; ['Value20'] // 我想我必须创建一个新的数据框，在其中我按月对列进行分组，值将是每年每列的总和。 ['Month'] = 1,2,3,4, // ['Value1'] = (第 1 个月的总和),(第 2 个月的总和),(第 3 个月的总和)... 抱歉如果我没有解释清楚的话！

Answer 1

您没有指定任何数据或数据结构。已构建一个结构，该结构将定义节点、有效的流和 按日期计算的实际流
从这个结构中，使用 https://plotly.com/python/dropdowns/ 构建了 year 和 month 过滤器下拉列表。请注意，这些是独立的过滤器，这就是 updatemenus 的静态结构如何工作
在这个解决方案中，一般是data frame列名和Sankeylink 属性名字一样。如果您的数据结构解决方案不是这种情况，可以修改为使用 dict 而不是 list

节点

	0
0	A0
1	A1
2	B0
3	B1
4	C0
5	C1

有效流量`dfflow`

source	target	source_name	target_name
0	2	A0	B0
1	2	A1	B0
0	3	A0	B1
1	3	A1	B1
2	4	B0	C0
3	4	B1	C0
2	5	B0	C1
3	5	B1	C1

样本流按日期 `df`

source	target	source_name	target_name	value	date
0	2	A0	B0	3.58321	2020-07-31 00:00:00
1	2	A1	B0	4.74713	2020-07-31 00:00:00
0	3	A0	B1	4.96593	2020-07-31 00:00:00
1	3	A1	B1	3.64883	2020-07-31 00:00:00
2	4	B0	C0	4.67168	2020-07-31 00:00:00
3	4	B1	C0	4.73339	2020-07-31 00:00:00
2	5	B0	C1	1.85678	2020-07-31 00:00:00
3	5	B1	C1	1.76691	2020-07-31 00:00:00
0	2	A0	B0	4.85048	2020-08-31 00:00:00
1	2	A1	B0	3.74573	2020-08-31 00:00:00
0	3	A0	B1	4.40529	2020-08-31 00:00:00
1	3	A1	B1	4.84975	2020-08-31 00:00:00
2	4	B0	C0	1.82983	2020-08-31 00:00:00
3	4	B1	C0	2.87512	2020-08-31 00:00:00
2	5	B0	C1	4.59346	2020-08-31 00:00:00

import pandas as pd
import numpy as np
import plotly.graph_objects as go
import itertools

nodes = (
    pd.DataFrame(itertools.product(list("ABC"), range(2)))
    .astype(str)
    .apply("".join, axis=1)
)

df = pd.DataFrame(
    itertools.combinations(nodes.index, 2), columns=["source", "target"]
).pipe(lambda d: d.assign(value=np.random.randint(1, 5, len(d))))

# create dataframe of flows between nodes, A nodes to to B nodes, B nodes to C nodes
dfflow = pd.concat(
    [
        pd.DataFrame(
            itertools.product(
                nodes.loc[nodes.str[0] == l[0]].index.tolist(),
                nodes.loc[nodes.str[0] == l[1]].index.tolist(),
            ),
            columns=["source", "target"],
        )
        for l in ["AB", "BC"]
    ]
)

# for purpose of human readability, put node names on flows
dfflow = dfflow.merge(
    nodes.rename("source_name"), left_on="source", right_index=True
).merge(nodes.rename("target_name"), left_on="target", right_index=True)

# create some values against flows for a range of dates
df = pd.concat(
    [
        dfflow.assign(value=np.random.uniform(1, 5, len(dfflow)), date=d)
        for d in pd.date_range("1-jul-2020", freq="M", periods=14)
    ]
)

# utility function to build a dropdown menu
def menu(df, filter=pd.Series([0]), y=1):
    label = np.concatenate([filter.unique(), [-99]])
    return {
        "y":y,
        "buttons": [
            {
                "label": str(l) if not l==-99 else "All",
                "method": "restyle",
                "args": [
                    "link",
                    {
                        "label"
                        if attr == "date"
                        else attr: df.loc[filter == l , attr].values if l!=-99 else df.loc[:,attr].values
                        for attr in ["source", "target", "value", "date"]
                    },
                ],
            }
            for l in label
        ]
    }


# build the sankey diagram and the required filter drop downs
go.Figure(
    go.Sankey(
        node={"label": nodes.values},
        link={
            "source": df["source"],
            "target": df["target"],
            "value": df["value"],
            "label": df["date"],
        },
    )
).update_layout(
    margin={"l": 0, "r": 0, "t": 0, "b": 0},
    updatemenus=[
        menu(
            df,
            filter=df["date"].dt.year,
        ),
        menu(
            df,
            filter=df["date"].dt.month, y=.9
        ),
    ],
)

Pandas 对每个月进行分组和求和

Pandas group by and sum for each month

dataframe

pandas

plotly

plotly-dash

plotly-python

节点

有效流量`dfflow`

样本流按日期 `df`

Pandas 对每个月进行分组和求和

Pandas group by and sum for each month

dataframe

pandas

plotly

plotly-dash

plotly-python

节点

有效流量dfflow

样本流按日期 df

有效流量`dfflow`

样本流按日期 `df`