如何在递归函数中构建 pandas 数据框？

Question

我正在尝试在数据挖掘中实现 'Bottom-Up Computation' 算法 (https://www.aaai.org/Papers/FLAIRS/2003/Flairs03-050.pdf)。

我需要使用 'pandas' 库创建一个数据框并将其提供给递归函数，该函数还应该 return 一个数据框作为输出。我只能 return 最后一列作为输出，因为我不知道如何动态构建数据框。

这是 python 程序：

import pandas as pd

def project_data(df, d):
    return df.iloc[:, d]

def select_data(df, d, val):
    col_name = df.columns[d]
    return df[df[col_name] == val]

def remove_first_dim(df):
    return df.iloc[:, 1:]

def slice_data_dim0(df, v):
    df_temp = select_data(df, 0, v)
    return remove_first_dim(df_temp)

def buc(df):
    dims = df.shape[1]
    if dims == 1:
        input_sum = sum(project_data(df, 0) )
        print(input_sum)
    else:
        dim_vals = set(project_data(df, 0).values)

        for dim_val in dim_vals:
            sub_data = slice_data_dim0(df, dim_val)
            buc(sub_data)
        sub_data = remove_first_dim(df)
        buc(sub_data)


data = {'A':[1,1,1,1,2],
        'B':[1,1,2,3,1],
        'M':[10,20,30,40,50]
        }
    
df = pd.DataFrame(data, columns = ['A','B','M'])
buc(df)

我得到以下输出：

但是我需要的是一个dataframe，像这样（不一定是格式化的，而是一个dataframe）：

    A   B   M
0   1   1   30
1   1   2   30
2   1   3   40
3   1   ALL 100
4   2   1   50
5   2   ALL 50
6   ALL 1   80
7   ALL 2   30
8   ALL 3   40
9   ALL ALL 150

我该如何实现？

Answer 1

不幸的是 pandas 没有进行小计的功能 - 所以诀窍是在旁边计算它们并与原始数据帧连接在一起。

from itertools import combinations
import numpy as np

dim = ['A', 'B']
vals = ['M']

df = pd.concat(
    [df]
# subtotals:
    + [df.groupby(list(gr), as_index=False)[vals].sum() for r in range(len(dim)-1) for gr in combinations(dim, r+1)]
# total:
    + [df.groupby(np.zeros(len(df)))[vals].sum()]
    )\
    .sort_values(dim)\
    .reset_index(drop=True)\
    .fillna("ALL")

输出：

      A    B    M
0     1    1   10
1     1    1   20
2     1    2   30
3     1    3   40
4     1  ALL  100
5     2    1   50
6     2  ALL   50
7   ALL    1   80
8   ALL    2   30
9   ALL    3   40
10  ALL  ALL  150

如何在递归函数中构建 pandas 数据框？

How to build a pandas dataframe in a recursive function?

python

pandas

data-mining