减去不同维度的数据透视表

Subtracting Pivot Tables with different dimensions

我在 pandas pre_pt 和 sd_pt 中有 2 个数据透视表,它们可能没有相同的列或索引值。例如,

pre_pt=
risk_type     R1             R2
qualifier                            
A        1.958512e+06 -10718.288787
B       -1.008596e+04      1.933457
C        0.000000e+00      0.329764
D        2.390952e+03   5726.806464
E        1.002147e+04     -4.661991
F       -2.016144e+06  12807.479302

sd_pt=
risk_type     R1               R3           R4
qualifier                                          
A        1.936494e+06      0.000000 -10425.198385
B       -1.010489e+04      0.000000      1.107070
C        0.000000e+00      0.000000      0.568966
D        2.648684e+03      0.000000   5640.661105
E        1.001735e+04      0.000000     -3.839769
F       -2.006834e+06      0.000000  12633.668916
G        0.000000e+00  11589.966215      0.000000

我希望能够在一个新的数据框中找出两者之间的区别,这样我就可以将这 3 个数据框连接成一个报告,我需要填写两个数据中都不存在的元素-框架为零,以便它们可以被减去。我使用以下代码执行此操作。哪个有效,但想知道 pandas

中是否有内置解决方案
 # create a set of the columns and indices
                my_cols = set()
                my_qs = set()
                my_pts = [pre_pt, sd_pt]

                for pts in my_pts:
                    my_cols.update(pts.columns.tolist())
                    my_qs.update(pts.index.tolist())

                #now add the cols, indices that don't exist
                for pts in my_pts:
                    pts_cols = pts.columns.tolist()
                    pts_qs = pts.index.tolist()
                    for c in my_cols:
                        if c not in pts_cols:
                            pts[c] = 0.0

                    for q in my_qs:
                        if q not in pts_qs:
                            pts.loc[q] = 0.0

                    pts = pts.sort_index(axis=1)
                    pts = pts.sort_index()

                diff = sd_pt - pre_pt

                #concatenate all of the pivot tables
                risk_sd = pd.concat([pre_pt, sd_pt, diff], axis = 1, keys = [pre_start_date, start_date, "Difference"], sort =True)

您可以使用 sub:

diff = sd_pt.sub(pre_pt, fill_value=0)
risk_sd = pd.concat([pre_pt, sd_pt, diff], axis=1, sort=True,
                    keys=['pre_start_date', 'start_date', 'Difference'])
print(risk_sd)

# Output
          pre_start_date                 start_date                             Difference                                          
risk_type             R1            R2           R1            R3            R4         R1            R2            R3            R4
qualifier                                                                                                                           
A            1958512.000 -10718.288787  1936494.000      0.000000 -10425.198385 -22018.000  10718.288787      0.000000 -10425.198385
B             -10085.960      1.933457   -10104.890      0.000000      1.107070    -18.930     -1.933457      0.000000      1.107070
C                  0.000      0.329764        0.000      0.000000      0.568966      0.000     -0.329764      0.000000      0.568966
D               2390.952   5726.806464     2648.684      0.000000   5640.661105    257.732  -5726.806464      0.000000   5640.661105
E              10021.470     -4.661991    10017.350      0.000000     -3.839769     -4.120      4.661991      0.000000     -3.839769
F           -2016144.000  12807.479302 -2006834.000      0.000000  12633.668916   9310.000 -12807.479302      0.000000  12633.668916
G                    NaN           NaN        0.000  11589.966215      0.000000      0.000           NaN  11589.966215      0.000000

pd.concat(...) 之后追加 .fillna(0) 以根据需要用 0 填充 NaN