减去不同维度的数据透视表
Subtracting Pivot Tables with different dimensions
我在 pandas pre_pt 和 sd_pt 中有 2 个数据透视表,它们可能没有相同的列或索引值。例如,
pre_pt=
risk_type R1 R2
qualifier
A 1.958512e+06 -10718.288787
B -1.008596e+04 1.933457
C 0.000000e+00 0.329764
D 2.390952e+03 5726.806464
E 1.002147e+04 -4.661991
F -2.016144e+06 12807.479302
sd_pt=
risk_type R1 R3 R4
qualifier
A 1.936494e+06 0.000000 -10425.198385
B -1.010489e+04 0.000000 1.107070
C 0.000000e+00 0.000000 0.568966
D 2.648684e+03 0.000000 5640.661105
E 1.001735e+04 0.000000 -3.839769
F -2.006834e+06 0.000000 12633.668916
G 0.000000e+00 11589.966215 0.000000
我希望能够在一个新的数据框中找出两者之间的区别,这样我就可以将这 3 个数据框连接成一个报告,我需要填写两个数据中都不存在的元素-框架为零,以便它们可以被减去。我使用以下代码执行此操作。哪个有效,但想知道 pandas
中是否有内置解决方案
# create a set of the columns and indices
my_cols = set()
my_qs = set()
my_pts = [pre_pt, sd_pt]
for pts in my_pts:
my_cols.update(pts.columns.tolist())
my_qs.update(pts.index.tolist())
#now add the cols, indices that don't exist
for pts in my_pts:
pts_cols = pts.columns.tolist()
pts_qs = pts.index.tolist()
for c in my_cols:
if c not in pts_cols:
pts[c] = 0.0
for q in my_qs:
if q not in pts_qs:
pts.loc[q] = 0.0
pts = pts.sort_index(axis=1)
pts = pts.sort_index()
diff = sd_pt - pre_pt
#concatenate all of the pivot tables
risk_sd = pd.concat([pre_pt, sd_pt, diff], axis = 1, keys = [pre_start_date, start_date, "Difference"], sort =True)
您可以使用 sub
:
diff = sd_pt.sub(pre_pt, fill_value=0)
risk_sd = pd.concat([pre_pt, sd_pt, diff], axis=1, sort=True,
keys=['pre_start_date', 'start_date', 'Difference'])
print(risk_sd)
# Output
pre_start_date start_date Difference
risk_type R1 R2 R1 R3 R4 R1 R2 R3 R4
qualifier
A 1958512.000 -10718.288787 1936494.000 0.000000 -10425.198385 -22018.000 10718.288787 0.000000 -10425.198385
B -10085.960 1.933457 -10104.890 0.000000 1.107070 -18.930 -1.933457 0.000000 1.107070
C 0.000 0.329764 0.000 0.000000 0.568966 0.000 -0.329764 0.000000 0.568966
D 2390.952 5726.806464 2648.684 0.000000 5640.661105 257.732 -5726.806464 0.000000 5640.661105
E 10021.470 -4.661991 10017.350 0.000000 -3.839769 -4.120 4.661991 0.000000 -3.839769
F -2016144.000 12807.479302 -2006834.000 0.000000 12633.668916 9310.000 -12807.479302 0.000000 12633.668916
G NaN NaN 0.000 11589.966215 0.000000 0.000 NaN 11589.966215 0.000000
在 pd.concat(...)
之后追加 .fillna(0)
以根据需要用 0
填充 NaN
。
我在 pandas pre_pt 和 sd_pt 中有 2 个数据透视表,它们可能没有相同的列或索引值。例如,
pre_pt=
risk_type R1 R2
qualifier
A 1.958512e+06 -10718.288787
B -1.008596e+04 1.933457
C 0.000000e+00 0.329764
D 2.390952e+03 5726.806464
E 1.002147e+04 -4.661991
F -2.016144e+06 12807.479302
sd_pt=
risk_type R1 R3 R4
qualifier
A 1.936494e+06 0.000000 -10425.198385
B -1.010489e+04 0.000000 1.107070
C 0.000000e+00 0.000000 0.568966
D 2.648684e+03 0.000000 5640.661105
E 1.001735e+04 0.000000 -3.839769
F -2.006834e+06 0.000000 12633.668916
G 0.000000e+00 11589.966215 0.000000
我希望能够在一个新的数据框中找出两者之间的区别,这样我就可以将这 3 个数据框连接成一个报告,我需要填写两个数据中都不存在的元素-框架为零,以便它们可以被减去。我使用以下代码执行此操作。哪个有效,但想知道 pandas
中是否有内置解决方案 # create a set of the columns and indices
my_cols = set()
my_qs = set()
my_pts = [pre_pt, sd_pt]
for pts in my_pts:
my_cols.update(pts.columns.tolist())
my_qs.update(pts.index.tolist())
#now add the cols, indices that don't exist
for pts in my_pts:
pts_cols = pts.columns.tolist()
pts_qs = pts.index.tolist()
for c in my_cols:
if c not in pts_cols:
pts[c] = 0.0
for q in my_qs:
if q not in pts_qs:
pts.loc[q] = 0.0
pts = pts.sort_index(axis=1)
pts = pts.sort_index()
diff = sd_pt - pre_pt
#concatenate all of the pivot tables
risk_sd = pd.concat([pre_pt, sd_pt, diff], axis = 1, keys = [pre_start_date, start_date, "Difference"], sort =True)
您可以使用 sub
:
diff = sd_pt.sub(pre_pt, fill_value=0)
risk_sd = pd.concat([pre_pt, sd_pt, diff], axis=1, sort=True,
keys=['pre_start_date', 'start_date', 'Difference'])
print(risk_sd)
# Output
pre_start_date start_date Difference
risk_type R1 R2 R1 R3 R4 R1 R2 R3 R4
qualifier
A 1958512.000 -10718.288787 1936494.000 0.000000 -10425.198385 -22018.000 10718.288787 0.000000 -10425.198385
B -10085.960 1.933457 -10104.890 0.000000 1.107070 -18.930 -1.933457 0.000000 1.107070
C 0.000 0.329764 0.000 0.000000 0.568966 0.000 -0.329764 0.000000 0.568966
D 2390.952 5726.806464 2648.684 0.000000 5640.661105 257.732 -5726.806464 0.000000 5640.661105
E 10021.470 -4.661991 10017.350 0.000000 -3.839769 -4.120 4.661991 0.000000 -3.839769
F -2016144.000 12807.479302 -2006834.000 0.000000 12633.668916 9310.000 -12807.479302 0.000000 12633.668916
G NaN NaN 0.000 11589.966215 0.000000 0.000 NaN 11589.966215 0.000000
在 pd.concat(...)
之后追加 .fillna(0)
以根据需要用 0
填充 NaN
。