为什么在 python 中使用 pandas 合并多个帧时行会加倍

Why rows get doubled when merging multiple frames with pandas in python

我正在尝试合并我从 excel 中读取的 pandas 的多个年终结果(2013 年至 2021 年) - 以这三年为例,使用以下代码生成:

HNRrawfile_2018 = HNR_source_path+'HNR_AR2018_XLS_en.xlsx'

df2018_assets = pd.read_excel(
    HNRrawfile_2018,
    sheet_name=assets,
    skiprows=4,
    skipfooter=1,
    usecols=[0, 2])
df2018_assets.columns = ['ITEM', '2018']
df2018_assets['2018'] = df2018_assets['2018'].astype(int)
df2018_assets['ITEM'] = df2018_assets['ITEM'].str.strip().str.lower()
print(df2018_assets)
                                                 ITEM      2019
0          fixed-income securities - held to maturity    223049
1     fixed-income securities - loans and receivables   2194064
2        fixed-income securities - available for sale  38068459
3   fixed-income securities - at fair value throug...    578779
4              equity securities - available for sale     29215
5   other financial assets - at fair value through...    235019
6                                 investment property   1749517
7                                   real estate funds    534739
8                 investments in associated companies    245478
9                               other invested assets   2211905
10                             short-term investments    468350
11                          cash and cash equivalents   1090852
12    total investments and cash under own management  47629426
13                                     funds withheld  10948469
14                                  contract deposits    325302
15                                  total investments  58903197
16          reinsurance recoverables on unpaid claims   2050114
17        reinsurance recoverables on benefit reserve    852598
18                        prepaid reinsurance premium    116176
19  reinsurance recoverables on other technical re...      9355
20                         deferred acquisition costs   2931722
21                                accounts receivable   5269792
22                                           goodwill     88303
23                                deferred tax assets    442469
24                                       other assets    640956
25                          accrued interest and rent     15414
26                               assets held for sale     36308
27                                       total assets  71356404

                                                 ITEM      2018
0          fixed-income securities – held to maturity    249943
1     fixed-income securities – loans and receivables   2398950
2        fixed-income securities – available for sale  33239685
3   fixed-income securities – at fair value throug...    559750
4              equity securities – available for sale     28729
5   other financial assets – at fair value through...    190759
6                                 investment property   1684932
7                                   real estate funds    433899
8                 investments in associated companies    110545
9                               other invested assets   1805281
10                             short-term investments    421950
11                          cash and cash equivalents   1072915
12    total investments and cash under own management  42197338
13                                     funds withheld  10691768
14                                  contract deposits    172873
15                                  total investments  53061979
16          reinsurance recoverables on unpaid claims   2084630
17        reinsurance recoverables on benefit reserve    909056
18                        prepaid reinsurance premium     93678
19  reinsurance recoverables on other technical re...      7170
20                         deferred acquisition costs   2155820
21                                accounts receivable   3975778
22                                           goodwill     85588
23                                deferred tax assets    454608
24                                       other assets    629420
25                          accrued interest and rent     11726
26                               assets held for sale   1039184
27                                       total assets  64508637

                                                 ITEM      2017
0          fixed-income securities – held to maturity    336182
1     fixed-income securities – loans and receivables   2455164
2        fixed-income securities – available for sale  31281908
3   fixed-income securities – at fair value throug...    212042
4              equity securities – available for sale     37520
5   other financial assets – at fair value through...     88832
6                   real estate and real estate funds   1968702
7                 investments in associated companies    121075
8                               other invested assets   1761678
9                              short-term investments    958669
10                          cash and cash equivalents    835706
11    total investments and cash under own management  40057478
12                                     funds withheld  10735012
13                                  contract deposits    167854
14                                  total investments  50960344
15          reinsurance recoverables on unpaid claims   1651335
16        reinsurance recoverables on benefit reserve    959533
17                        prepaid reinsurance premium     96402
18  reinsurance recoverables on other technical re...      7301
19                         deferred acquisition costs   2228246
20                                accounts receivable   3821124
21                                           goodwill     91692
22                                deferred tax assets    466564
23                                       other assets    904253
24                          accrued interest and rent     10052
25                               assets held for sale         0
26                                       total assets  61196846

然而,有些行会翻倍,尤其是固定收益行,这些行在 2021 年至 2019 年期间出现在顶部,但从 2018 年开始在底部翻了一番:

                                                 ITEM        2021        2020        2019        2018        2017        2016        2015        2014        2013
0          fixed-income securities - held to maturity     48632.0    185577.0    223049.0         NaN         NaN         NaN         NaN         NaN         NaN
1     fixed-income securities - loans and receivables   2443629.0   2532146.0   2194064.0         NaN         NaN         NaN         NaN         NaN         NaN
2        fixed-income securities - available for sale  45473677.0  38851723.0  38068459.0         NaN         NaN         NaN         NaN         NaN         NaN
3   fixed-income securities - at fair value throug...     81308.0    105711.0    578779.0         NaN         NaN         NaN         NaN         NaN         NaN
4              equity securities - available for sale    314453.0    378422.0     29215.0         NaN         NaN         NaN         NaN         NaN         NaN
5   other financial assets - at fair value through...    248233.0    234689.0    235019.0         NaN         NaN         NaN         NaN         NaN         NaN
6                                 investment property   1818754.0   1589238.0   1749517.0   1684932.0         NaN         NaN         NaN         NaN         NaN
7                                   real estate funds    805912.0    582296.0    534739.0    433899.0         NaN         NaN         NaN         NaN         NaN
8                 investments in associated companies    238110.0    361617.0    245478.0    110545.0    121075.0    114633.0    128008.0    154822.0    144489.0
9                               other invested assets   2941633.0   2794016.0   2211905.0   1805281.0   1761678.0   1764678.0   1544533.0   1316604.0   1023214.0
10                             short-term investments    443793.0    327426.0    468350.0    421950.0    958669.0    838987.0   1113130.0    575300.0    549138.0
11                          cash and cash equivalents   1355114.0   1278071.0   1090852.0   1072915.0    835706.0    848667.0    792604.0    772882.0    642936.0
12    total investments and cash under own management  56213248.0  49220932.0  47629426.0  42197338.0  40057478.0  41793495.0  39346903.0  36228010.0  31875242.0
13                                     funds withheld  10803071.0   9659807.0  10948469.0  10691768.0  10735012.0  11673259.0  13801845.0  15826480.0  14267831.0
14                                  contract deposits    503412.0    298344.0    325302.0    172873.0    167854.0    170505.0    188604.0     92069.0     75541.0
15                                  total investments  67519731.0  59179083.0  58903197.0  53061979.0  50960344.0  53637259.0  53337352.0  52146559.0  46218614.0
16          reinsurance recoverables on unpaid claims   2674107.0   1883270.0   2050114.0   2084630.0   1651335.0   1506292.0   1395281.0   1376432.0   1403804.0
17        reinsurance recoverables on benefit reserve    192039.0    192135.0    852598.0    909056.0    959533.0   1189420.0   1367173.0    676219.0    344154.0
18                        prepaid reinsurance premium    204597.0    165916.0    116176.0     93678.0     96402.0    134927.0    164023.0    149257.0    139039.0
19  reinsurance recoverables on other technical re...      2703.0      1106.0      9355.0      7170.0      7301.0     12231.0      8687.0      5446.0      6893.0
20                         deferred acquisition costs   3350633.0   2857071.0   2931722.0   2155820.0   2228246.0   2198089.0   2094671.0   1914598.0   1672398.0
21                                accounts receivable   7207750.0   5605803.0   5269792.0   3975778.0   3821124.0   3678030.0   3665937.0   3113978.0   2945685.0
22                                           goodwill     83933.0     80965.0     88303.0     85588.0     91692.0     64609.0     60244.0     58220.0     57070.0
23                                deferred tax assets    676344.0    597986.0    442469.0    454608.0    466564.0    408292.0    433500.0    393923.0    508841.0
24                                       other assets    972167.0    858170.0    640956.0    629420.0    904253.0    674389.0    680543.0    618280.0    603627.0
25                          accrued interest and rent     18248.0     18264.0     15414.0     11726.0     10052.0      9978.0      7527.0      4672.0      4193.0
26                                       total assets  82902252.0  71439769.0  71356404.0  64508637.0  61196846.0  63528602.0  63214938.0  60457584.0  53915544.0
27                               assets held for sale         NaN         0.0     36308.0   1039184.0         0.0     15086.0         NaN         0.0     11226.0
28         fixed-income securities – held to maturity         NaN         NaN         NaN    249943.0    336182.0    484955.0   1007665.0   2139742.0   2666787.0
29    fixed-income securities – loans and receivables         NaN         NaN         NaN   2398950.0   2455164.0   2563594.0   2869865.0   2988187.0   3209100.0
30       fixed-income securities – available for sale         NaN         NaN         NaN  33239685.0  31281908.0  32182173.0  29616448.0  26817523.0  22409892.0
31  fixed-income securities – at fair value throug...         NaN         NaN         NaN    559750.0    212042.0    239917.0    108982.0     64494.0     36061.0
32             equity securities – available for sale         NaN         NaN         NaN     28729.0     37520.0    905307.0    452108.0     32804.0     28980.0
33  other financial assets – at fair value through...         NaN         NaN         NaN    190759.0     88832.0     57665.0     39602.0     66394.0     70082.0
34                  real estate and real estate funds         NaN         NaN         NaN         NaN   1968702.0   1792919.0   1673958.0   1299258.0   1094563.0

合并是用outer做的,因为有些职位每年都没有出现,我想确保从左到右收集每一行table:

df_assets_1 = df2021_assets.merge(
    df2020_assets, how='outer', on='ITEM').merge(
    df2019_assets, how='outer', on='ITEM').merge(
    df2018_assets, how='outer', on='ITEM').merge(
    df2017_assets, how='outer', on='ITEM').merge(
    df2016_assets, how='outer', on='ITEM').merge(
    df2015_assets, how='outer', on='ITEM').merge(
    df2014_assets, how='outer', on='ITEM').merge(
    df2013_assets, how='outer', on='ITEM')

我尝试通过

修复它

非常感谢任何帮助 - 非常感谢您抽出宝贵的时间!

我检查了在 PowerQuery 中使用 M 加入的过程,几乎每个步骤后都需要清理数据,因为由于财务结果的性质(可能有一个总体 'title-row'(例如股东价值)和另一行具有相同名称的股东价值总和)。

因此,在每次外部合并之后,必须删除可能的重复项,然后才能继续进行下一次外部合并。这导致了重复。