Pandas groupby() 版本 0.23.4 和 1.3.4 的不同输出

Pandas groupby() different output with versions 0.23.4 and 1.3.4

我有 2 个代码库,代码相同,唯一的区别是使用的 pandas 版本:

我调试到这行代码,之后result不一样了:

result = df.groupby(group_items, as_index=as_index, sort=sort)[sum_items].sum()

变量 dfgroup_itemsas_indexsortsum_items 在新旧环境中完全相同。

但是,返回的 result 在新版本中有点不同。具体来说,输出如下所示:

新环境:

df.groupby(group_items, as_index=as_index, sort=sort)[sum_items].sum()
        SST_ADJ_TYPE    SST_ADJ_RULE  ... NCI     AMOUNT
0                  0  SST22a,SST22b,  ...      1874757.0
1                  0  SST22a,SST22b,  ...      5945263.0
2                  0  SST22a,SST22b,  ...      4303110.0
3                  0  SST22a,SST22b,  ...      5342991.0
4                  0  SST22a,SST22b,  ...      9245478.0
...              ...             ...  ...  ..        ...
133674             3   SST22b,SST07,  ...      4164305.0
133675             3   SST22b,SST07,  ...      7280203.0
133676             3   SST22b,SST07,  ...      1235752.0
133677             3   SST22b,SST07,  ...      3115825.0
133678             3   SST22b,SST07,  ...      1436891.0
[133679 rows x 16 columns]

旧环境:

df.groupby(group_items, as_index=as_index, sort=sort)[sum_items].sum()
        SST_ADJ_TYPE    SST_ADJ_RULE    ...     NCI     AMOUNT
0                  0  SST22a,SST22b,    ...          1874757.0
1                  0  SST22a,SST22b,    ...          5945263.0
2                  0  SST22a,SST22b,    ...          4303110.0
3                  0  SST22a,SST22b,    ...          5342991.0
4                  0  SST22a,SST22b,    ...          9245478.0
5                  0  SST22a,SST22b,    ...          4016202.0
6                  0  SST22a,SST22b,    ...          8799969.0
7                  0  SST22a,SST22b,    ...          1503269.0
8                  0  SST22a,SST22b,    ...          6385991.0
9                  0  SST22a,SST22b,    ...          1686520.0
10                 0  SST22a,SST22b,    ...          5287114.0
11                 0  SST22a,SST22b,    ...          2648534.0
12                 0  SST22a,SST22b,    ...          6159017.0
13                 0  SST22a,SST22b,    ...          5959591.0
14                 0  SST22a,SST22b,    ...          5809998.0
15                 0  SST22a,SST22b,    ...          4929077.0
16                 0  SST22a,SST22b,    ...          9166004.0
17                 0  SST22a,SST22b,    ...          2124498.0
18                 0  SST22a,SST22b,    ...          3051659.0
19                 0  SST22a,SST22b,    ...          1859001.0
20                 0  SST22a,SST22b,    ...          8522834.0
21                 0  SST22a,SST22b,    ...          7803526.0
22                 0  SST22a,SST22b,    ...          4067546.0
23                 0  SST22a,SST22b,    ...          9218486.0
24                 0  SST22a,SST22b,    ...          1453153.0
25                 0  SST22a,SST22b,    ...          7411706.0
26                 0  SST22a,SST22b,    ...          9160444.0
27                 0  SST22a,SST22b,    ...          6255426.0
28                 0  SST22a,SST22b,    ...          6007841.0
29                 0  SST22a,SST22b,    ...          4744588.0
...              ...             ...    ...      ..        ...
133649             3   SST22b,SST07,    ...          6487572.0
133650             3   SST22b,SST07,    ...          3593805.0
133651             3   SST22b,SST07,    ...          9192954.0
133652             3   SST22b,SST07,    ...          2394981.0
133653             3   SST22b,SST07,    ...          9398971.0
133654             3   SST22b,SST07,    ...          5536294.0
133655             3   SST22b,SST07,    ...          8759613.0
133656             3   SST22b,SST07,    ...          2012212.0
133657             3   SST22b,SST07,    ...          7930551.0
133658             3   SST22b,SST07,    ...          3407871.0
133659             3   SST22b,SST07,    ...          3071541.0
133660             3   SST22b,SST07,    ...          1863129.0
133661             3   SST22b,SST07,    ...          8439646.0
133662             3   SST22b,SST07,    ...          1518097.0
133663             3   SST22b,SST07,    ...          7396702.0
133664             3   SST22b,SST07,    ...          8470274.0
133665             3   SST22b,SST07,    ...          8363095.0
133666             3   SST22b,SST07,    ...          1115614.0
133667             3   SST22b,SST07,    ...          6317772.0
133668             3   SST22b,SST07,    ...          2645613.0
133669             3   SST22b,SST07,    ...          6555039.0
133670             3   SST22b,SST07,    ...          5274987.0
133671             3   SST22b,SST07,    ...          5779789.0
133672             3   SST22b,SST07,    ...          6974948.0
133673             3   SST22b,SST07,    ...          6370779.0
133674             3   SST22b,SST07,    ...          4164305.0
133675             3   SST22b,SST07,    ...          7280203.0
133676             3   SST22b,SST07,    ...          1235752.0
133677             3   SST22b,SST07,    ...          1436891.0
133678             3   SST22b,SST07,    ...          3115825.0
[133679 rows x 16 columns]

如您所见,行数和列数是一样的。 两个 result 之间的列也完全相同。 但是,当您检查 AMOUNT 列时,您会看到,例如在最后一行中,来自 NEW 环境的 result 具有合并的值(例如,最后一行交换为前一行) .

知道为什么会这样吗?

PS: 不幸的是,我无法提供您可以加载的 DataFrame,因为我使用的 DataFrame 中包含大量数据。我更多的是寻找一个理论答案,说明上面提到的 pandas and/or 版本之间发生了什么变化,在新环境中使用哪个参数可以得到与旧环境中完全相同的结果。

好吧,您的数据未排序,在那些版本中似乎 pandas returns 数据的顺序不同。 来自 Pandas 文档:

sort in group by just sort the group keys and this does not influence the order of observations within each group.

您可以在分组依据后按 .sort_values() 对数据进行排序,有关详细信息,请参阅 docs