Python 数据集中值的总和

Question

我有这个数据框（ID 是一个字符串，Value 是一个浮点数）：

ID              Value
1               0.0
1.1             0.0
1.2             0.0
1.2.1           27508.42
1.2.2           25861.82
1.3             0.0
1.3.1           0.0
1.3.1.1         0.0
1.3.1.2         0.0
1.3.1.3         30396.25

谁的结构是这样的：

1
├── 1.1  
├── 1.2  
│   ├── 1.2.1  
│   └── 1.2.2  
└── 1.3  
    └── 1.3.1  
        ├── 1.3.1.1
        ├── 1.3.1.2    
        └── 1.3.1.3

并且需要 'parent' 节点的值是叶子的总和。所以：

ID              Value
1               83766.489    (1.1 + 1.2 + 1.3)
1.1             0.0
1.2             53370.24     (1.2.1 + 1.2.2)
1.2.1           27508.42
1.2.2           25861.82
1.3             30396.25     (1.3.1)
1.3.1           30396.25     (1.3.1.1 + 1.3.1.2 + 1.3.1.3)
1.3.1.1         0.0
1.3.1.2         0.0
1.3.1.3         30396.25

如何对 ID 进行分组？使用 groupby 不会起作用，因为所有 ID 都是唯一的。我是否应该更改数据框的结构以更好地反映模式的逻辑？

Answer 1

您可以找到哪些 ID 构成每个 ID 的“子 ID”，然后对这些“子 ID”求和

from itertools import tee
from collections import defaultdict
d = defaultdict(list)
a, b = tee(df['ID'].values)
b = list(b)
for a_val in a:
    for b_val in b:
        if b_val.startswith(a_val):
            d[a_val].append(b_val)
d
for b_val in b:
    df.loc[df['ID'] == b_val, 'total'] = sum(df.loc[df['ID'].isin(d[b_val]), 'Value'])
print(df)

        ID     Value     total
0        1      0.00  83766.49
1      1.1      0.00      0.00
2      1.2      0.00  53370.24
3    1.2.1  27508.42  27508.42
4    1.2.2  25861.82  25861.82
5      1.3      0.00  30396.25
6    1.3.1      0.00  30396.25
7  1.3.1.1      0.00      0.00
8  1.3.1.2      0.00      0.00
9  1.3.1.3  30396.25  30396.25

Answer 2

另一种解决方案（假设第 ID 列已排序）：

def counter(x):
    out = []
    for id_, v in zip(x.index, x):
        s = sum(
            v
            for a, v in out
            if a.startswith(id_) and id_.count(".") == a.count(".") - 1
        )
        out.append((id_, s + v))
    return [v for _, v in out]


print(df.set_index("ID")[::-1].apply(counter)[::-1].reset_index())

打印：

        ID     Value
0        1  83766.49
1      1.1      0.00
2      1.2  53370.24
3    1.2.1  27508.42
4    1.2.2  25861.82
5      1.3  30396.25
6    1.3.1  30396.25
7  1.3.1.1      0.00
8  1.3.1.2      0.00
9  1.3.1.3  30396.25

Python 数据集中值的总和

Python sum of values in dataset

python

string

pandas