为什么在 python 中使用 pandas 合并多个帧时行会加倍
Why rows get doubled when merging multiple frames with pandas in python
我正在尝试合并我从 excel 中读取的 pandas 的多个年终结果(2013 年至 2021 年) - 以这三年为例,使用以下代码生成:
HNRrawfile_2018 = HNR_source_path+'HNR_AR2018_XLS_en.xlsx'
df2018_assets = pd.read_excel(
HNRrawfile_2018,
sheet_name=assets,
skiprows=4,
skipfooter=1,
usecols=[0, 2])
df2018_assets.columns = ['ITEM', '2018']
df2018_assets['2018'] = df2018_assets['2018'].astype(int)
df2018_assets['ITEM'] = df2018_assets['ITEM'].str.strip().str.lower()
print(df2018_assets)
ITEM 2019
0 fixed-income securities - held to maturity 223049
1 fixed-income securities - loans and receivables 2194064
2 fixed-income securities - available for sale 38068459
3 fixed-income securities - at fair value throug... 578779
4 equity securities - available for sale 29215
5 other financial assets - at fair value through... 235019
6 investment property 1749517
7 real estate funds 534739
8 investments in associated companies 245478
9 other invested assets 2211905
10 short-term investments 468350
11 cash and cash equivalents 1090852
12 total investments and cash under own management 47629426
13 funds withheld 10948469
14 contract deposits 325302
15 total investments 58903197
16 reinsurance recoverables on unpaid claims 2050114
17 reinsurance recoverables on benefit reserve 852598
18 prepaid reinsurance premium 116176
19 reinsurance recoverables on other technical re... 9355
20 deferred acquisition costs 2931722
21 accounts receivable 5269792
22 goodwill 88303
23 deferred tax assets 442469
24 other assets 640956
25 accrued interest and rent 15414
26 assets held for sale 36308
27 total assets 71356404
ITEM 2018
0 fixed-income securities – held to maturity 249943
1 fixed-income securities – loans and receivables 2398950
2 fixed-income securities – available for sale 33239685
3 fixed-income securities – at fair value throug... 559750
4 equity securities – available for sale 28729
5 other financial assets – at fair value through... 190759
6 investment property 1684932
7 real estate funds 433899
8 investments in associated companies 110545
9 other invested assets 1805281
10 short-term investments 421950
11 cash and cash equivalents 1072915
12 total investments and cash under own management 42197338
13 funds withheld 10691768
14 contract deposits 172873
15 total investments 53061979
16 reinsurance recoverables on unpaid claims 2084630
17 reinsurance recoverables on benefit reserve 909056
18 prepaid reinsurance premium 93678
19 reinsurance recoverables on other technical re... 7170
20 deferred acquisition costs 2155820
21 accounts receivable 3975778
22 goodwill 85588
23 deferred tax assets 454608
24 other assets 629420
25 accrued interest and rent 11726
26 assets held for sale 1039184
27 total assets 64508637
ITEM 2017
0 fixed-income securities – held to maturity 336182
1 fixed-income securities – loans and receivables 2455164
2 fixed-income securities – available for sale 31281908
3 fixed-income securities – at fair value throug... 212042
4 equity securities – available for sale 37520
5 other financial assets – at fair value through... 88832
6 real estate and real estate funds 1968702
7 investments in associated companies 121075
8 other invested assets 1761678
9 short-term investments 958669
10 cash and cash equivalents 835706
11 total investments and cash under own management 40057478
12 funds withheld 10735012
13 contract deposits 167854
14 total investments 50960344
15 reinsurance recoverables on unpaid claims 1651335
16 reinsurance recoverables on benefit reserve 959533
17 prepaid reinsurance premium 96402
18 reinsurance recoverables on other technical re... 7301
19 deferred acquisition costs 2228246
20 accounts receivable 3821124
21 goodwill 91692
22 deferred tax assets 466564
23 other assets 904253
24 accrued interest and rent 10052
25 assets held for sale 0
26 total assets 61196846
然而,有些行会翻倍,尤其是固定收益行,这些行在 2021 年至 2019 年期间出现在顶部,但从 2018 年开始在底部翻了一番:
ITEM 2021 2020 2019 2018 2017 2016 2015 2014 2013
0 fixed-income securities - held to maturity 48632.0 185577.0 223049.0 NaN NaN NaN NaN NaN NaN
1 fixed-income securities - loans and receivables 2443629.0 2532146.0 2194064.0 NaN NaN NaN NaN NaN NaN
2 fixed-income securities - available for sale 45473677.0 38851723.0 38068459.0 NaN NaN NaN NaN NaN NaN
3 fixed-income securities - at fair value throug... 81308.0 105711.0 578779.0 NaN NaN NaN NaN NaN NaN
4 equity securities - available for sale 314453.0 378422.0 29215.0 NaN NaN NaN NaN NaN NaN
5 other financial assets - at fair value through... 248233.0 234689.0 235019.0 NaN NaN NaN NaN NaN NaN
6 investment property 1818754.0 1589238.0 1749517.0 1684932.0 NaN NaN NaN NaN NaN
7 real estate funds 805912.0 582296.0 534739.0 433899.0 NaN NaN NaN NaN NaN
8 investments in associated companies 238110.0 361617.0 245478.0 110545.0 121075.0 114633.0 128008.0 154822.0 144489.0
9 other invested assets 2941633.0 2794016.0 2211905.0 1805281.0 1761678.0 1764678.0 1544533.0 1316604.0 1023214.0
10 short-term investments 443793.0 327426.0 468350.0 421950.0 958669.0 838987.0 1113130.0 575300.0 549138.0
11 cash and cash equivalents 1355114.0 1278071.0 1090852.0 1072915.0 835706.0 848667.0 792604.0 772882.0 642936.0
12 total investments and cash under own management 56213248.0 49220932.0 47629426.0 42197338.0 40057478.0 41793495.0 39346903.0 36228010.0 31875242.0
13 funds withheld 10803071.0 9659807.0 10948469.0 10691768.0 10735012.0 11673259.0 13801845.0 15826480.0 14267831.0
14 contract deposits 503412.0 298344.0 325302.0 172873.0 167854.0 170505.0 188604.0 92069.0 75541.0
15 total investments 67519731.0 59179083.0 58903197.0 53061979.0 50960344.0 53637259.0 53337352.0 52146559.0 46218614.0
16 reinsurance recoverables on unpaid claims 2674107.0 1883270.0 2050114.0 2084630.0 1651335.0 1506292.0 1395281.0 1376432.0 1403804.0
17 reinsurance recoverables on benefit reserve 192039.0 192135.0 852598.0 909056.0 959533.0 1189420.0 1367173.0 676219.0 344154.0
18 prepaid reinsurance premium 204597.0 165916.0 116176.0 93678.0 96402.0 134927.0 164023.0 149257.0 139039.0
19 reinsurance recoverables on other technical re... 2703.0 1106.0 9355.0 7170.0 7301.0 12231.0 8687.0 5446.0 6893.0
20 deferred acquisition costs 3350633.0 2857071.0 2931722.0 2155820.0 2228246.0 2198089.0 2094671.0 1914598.0 1672398.0
21 accounts receivable 7207750.0 5605803.0 5269792.0 3975778.0 3821124.0 3678030.0 3665937.0 3113978.0 2945685.0
22 goodwill 83933.0 80965.0 88303.0 85588.0 91692.0 64609.0 60244.0 58220.0 57070.0
23 deferred tax assets 676344.0 597986.0 442469.0 454608.0 466564.0 408292.0 433500.0 393923.0 508841.0
24 other assets 972167.0 858170.0 640956.0 629420.0 904253.0 674389.0 680543.0 618280.0 603627.0
25 accrued interest and rent 18248.0 18264.0 15414.0 11726.0 10052.0 9978.0 7527.0 4672.0 4193.0
26 total assets 82902252.0 71439769.0 71356404.0 64508637.0 61196846.0 63528602.0 63214938.0 60457584.0 53915544.0
27 assets held for sale NaN 0.0 36308.0 1039184.0 0.0 15086.0 NaN 0.0 11226.0
28 fixed-income securities – held to maturity NaN NaN NaN 249943.0 336182.0 484955.0 1007665.0 2139742.0 2666787.0
29 fixed-income securities – loans and receivables NaN NaN NaN 2398950.0 2455164.0 2563594.0 2869865.0 2988187.0 3209100.0
30 fixed-income securities – available for sale NaN NaN NaN 33239685.0 31281908.0 32182173.0 29616448.0 26817523.0 22409892.0
31 fixed-income securities – at fair value throug... NaN NaN NaN 559750.0 212042.0 239917.0 108982.0 64494.0 36061.0
32 equity securities – available for sale NaN NaN NaN 28729.0 37520.0 905307.0 452108.0 32804.0 28980.0
33 other financial assets – at fair value through... NaN NaN NaN 190759.0 88832.0 57665.0 39602.0 66394.0 70082.0
34 real estate and real estate funds NaN NaN NaN NaN 1968702.0 1792919.0 1673958.0 1299258.0 1094563.0
合并是用outer做的,因为有些职位每年都没有出现,我想确保从左到右收集每一行table:
df_assets_1 = df2021_assets.merge(
df2020_assets, how='outer', on='ITEM').merge(
df2019_assets, how='outer', on='ITEM').merge(
df2018_assets, how='outer', on='ITEM').merge(
df2017_assets, how='outer', on='ITEM').merge(
df2016_assets, how='outer', on='ITEM').merge(
df2015_assets, how='outer', on='ITEM').merge(
df2014_assets, how='outer', on='ITEM').merge(
df2013_assets, how='outer', on='ITEM')
我尝试通过
修复它
- 确保删除项目描述并全部小写
- 在 axis=1 上使用 .concat 而不是 .merge
- 使用左连接,但会从右侧切掉位置 table(如预期)
- 在没有 on='ITEM' 的情况下尝试,但这没有任何区别
非常感谢任何帮助 - 非常感谢您抽出宝贵的时间!
我检查了在 PowerQuery 中使用 M 加入的过程,几乎每个步骤后都需要清理数据,因为由于财务结果的性质(可能有一个总体 'title-row'(例如股东价值)和另一行具有相同名称的股东价值总和)。
因此,在每次外部合并之后,必须删除可能的重复项,然后才能继续进行下一次外部合并。这导致了重复。
我正在尝试合并我从 excel 中读取的 pandas 的多个年终结果(2013 年至 2021 年) - 以这三年为例,使用以下代码生成:
HNRrawfile_2018 = HNR_source_path+'HNR_AR2018_XLS_en.xlsx'
df2018_assets = pd.read_excel(
HNRrawfile_2018,
sheet_name=assets,
skiprows=4,
skipfooter=1,
usecols=[0, 2])
df2018_assets.columns = ['ITEM', '2018']
df2018_assets['2018'] = df2018_assets['2018'].astype(int)
df2018_assets['ITEM'] = df2018_assets['ITEM'].str.strip().str.lower()
print(df2018_assets)
ITEM 2019
0 fixed-income securities - held to maturity 223049
1 fixed-income securities - loans and receivables 2194064
2 fixed-income securities - available for sale 38068459
3 fixed-income securities - at fair value throug... 578779
4 equity securities - available for sale 29215
5 other financial assets - at fair value through... 235019
6 investment property 1749517
7 real estate funds 534739
8 investments in associated companies 245478
9 other invested assets 2211905
10 short-term investments 468350
11 cash and cash equivalents 1090852
12 total investments and cash under own management 47629426
13 funds withheld 10948469
14 contract deposits 325302
15 total investments 58903197
16 reinsurance recoverables on unpaid claims 2050114
17 reinsurance recoverables on benefit reserve 852598
18 prepaid reinsurance premium 116176
19 reinsurance recoverables on other technical re... 9355
20 deferred acquisition costs 2931722
21 accounts receivable 5269792
22 goodwill 88303
23 deferred tax assets 442469
24 other assets 640956
25 accrued interest and rent 15414
26 assets held for sale 36308
27 total assets 71356404
ITEM 2018
0 fixed-income securities – held to maturity 249943
1 fixed-income securities – loans and receivables 2398950
2 fixed-income securities – available for sale 33239685
3 fixed-income securities – at fair value throug... 559750
4 equity securities – available for sale 28729
5 other financial assets – at fair value through... 190759
6 investment property 1684932
7 real estate funds 433899
8 investments in associated companies 110545
9 other invested assets 1805281
10 short-term investments 421950
11 cash and cash equivalents 1072915
12 total investments and cash under own management 42197338
13 funds withheld 10691768
14 contract deposits 172873
15 total investments 53061979
16 reinsurance recoverables on unpaid claims 2084630
17 reinsurance recoverables on benefit reserve 909056
18 prepaid reinsurance premium 93678
19 reinsurance recoverables on other technical re... 7170
20 deferred acquisition costs 2155820
21 accounts receivable 3975778
22 goodwill 85588
23 deferred tax assets 454608
24 other assets 629420
25 accrued interest and rent 11726
26 assets held for sale 1039184
27 total assets 64508637
ITEM 2017
0 fixed-income securities – held to maturity 336182
1 fixed-income securities – loans and receivables 2455164
2 fixed-income securities – available for sale 31281908
3 fixed-income securities – at fair value throug... 212042
4 equity securities – available for sale 37520
5 other financial assets – at fair value through... 88832
6 real estate and real estate funds 1968702
7 investments in associated companies 121075
8 other invested assets 1761678
9 short-term investments 958669
10 cash and cash equivalents 835706
11 total investments and cash under own management 40057478
12 funds withheld 10735012
13 contract deposits 167854
14 total investments 50960344
15 reinsurance recoverables on unpaid claims 1651335
16 reinsurance recoverables on benefit reserve 959533
17 prepaid reinsurance premium 96402
18 reinsurance recoverables on other technical re... 7301
19 deferred acquisition costs 2228246
20 accounts receivable 3821124
21 goodwill 91692
22 deferred tax assets 466564
23 other assets 904253
24 accrued interest and rent 10052
25 assets held for sale 0
26 total assets 61196846
然而,有些行会翻倍,尤其是固定收益行,这些行在 2021 年至 2019 年期间出现在顶部,但从 2018 年开始在底部翻了一番:
ITEM 2021 2020 2019 2018 2017 2016 2015 2014 2013
0 fixed-income securities - held to maturity 48632.0 185577.0 223049.0 NaN NaN NaN NaN NaN NaN
1 fixed-income securities - loans and receivables 2443629.0 2532146.0 2194064.0 NaN NaN NaN NaN NaN NaN
2 fixed-income securities - available for sale 45473677.0 38851723.0 38068459.0 NaN NaN NaN NaN NaN NaN
3 fixed-income securities - at fair value throug... 81308.0 105711.0 578779.0 NaN NaN NaN NaN NaN NaN
4 equity securities - available for sale 314453.0 378422.0 29215.0 NaN NaN NaN NaN NaN NaN
5 other financial assets - at fair value through... 248233.0 234689.0 235019.0 NaN NaN NaN NaN NaN NaN
6 investment property 1818754.0 1589238.0 1749517.0 1684932.0 NaN NaN NaN NaN NaN
7 real estate funds 805912.0 582296.0 534739.0 433899.0 NaN NaN NaN NaN NaN
8 investments in associated companies 238110.0 361617.0 245478.0 110545.0 121075.0 114633.0 128008.0 154822.0 144489.0
9 other invested assets 2941633.0 2794016.0 2211905.0 1805281.0 1761678.0 1764678.0 1544533.0 1316604.0 1023214.0
10 short-term investments 443793.0 327426.0 468350.0 421950.0 958669.0 838987.0 1113130.0 575300.0 549138.0
11 cash and cash equivalents 1355114.0 1278071.0 1090852.0 1072915.0 835706.0 848667.0 792604.0 772882.0 642936.0
12 total investments and cash under own management 56213248.0 49220932.0 47629426.0 42197338.0 40057478.0 41793495.0 39346903.0 36228010.0 31875242.0
13 funds withheld 10803071.0 9659807.0 10948469.0 10691768.0 10735012.0 11673259.0 13801845.0 15826480.0 14267831.0
14 contract deposits 503412.0 298344.0 325302.0 172873.0 167854.0 170505.0 188604.0 92069.0 75541.0
15 total investments 67519731.0 59179083.0 58903197.0 53061979.0 50960344.0 53637259.0 53337352.0 52146559.0 46218614.0
16 reinsurance recoverables on unpaid claims 2674107.0 1883270.0 2050114.0 2084630.0 1651335.0 1506292.0 1395281.0 1376432.0 1403804.0
17 reinsurance recoverables on benefit reserve 192039.0 192135.0 852598.0 909056.0 959533.0 1189420.0 1367173.0 676219.0 344154.0
18 prepaid reinsurance premium 204597.0 165916.0 116176.0 93678.0 96402.0 134927.0 164023.0 149257.0 139039.0
19 reinsurance recoverables on other technical re... 2703.0 1106.0 9355.0 7170.0 7301.0 12231.0 8687.0 5446.0 6893.0
20 deferred acquisition costs 3350633.0 2857071.0 2931722.0 2155820.0 2228246.0 2198089.0 2094671.0 1914598.0 1672398.0
21 accounts receivable 7207750.0 5605803.0 5269792.0 3975778.0 3821124.0 3678030.0 3665937.0 3113978.0 2945685.0
22 goodwill 83933.0 80965.0 88303.0 85588.0 91692.0 64609.0 60244.0 58220.0 57070.0
23 deferred tax assets 676344.0 597986.0 442469.0 454608.0 466564.0 408292.0 433500.0 393923.0 508841.0
24 other assets 972167.0 858170.0 640956.0 629420.0 904253.0 674389.0 680543.0 618280.0 603627.0
25 accrued interest and rent 18248.0 18264.0 15414.0 11726.0 10052.0 9978.0 7527.0 4672.0 4193.0
26 total assets 82902252.0 71439769.0 71356404.0 64508637.0 61196846.0 63528602.0 63214938.0 60457584.0 53915544.0
27 assets held for sale NaN 0.0 36308.0 1039184.0 0.0 15086.0 NaN 0.0 11226.0
28 fixed-income securities – held to maturity NaN NaN NaN 249943.0 336182.0 484955.0 1007665.0 2139742.0 2666787.0
29 fixed-income securities – loans and receivables NaN NaN NaN 2398950.0 2455164.0 2563594.0 2869865.0 2988187.0 3209100.0
30 fixed-income securities – available for sale NaN NaN NaN 33239685.0 31281908.0 32182173.0 29616448.0 26817523.0 22409892.0
31 fixed-income securities – at fair value throug... NaN NaN NaN 559750.0 212042.0 239917.0 108982.0 64494.0 36061.0
32 equity securities – available for sale NaN NaN NaN 28729.0 37520.0 905307.0 452108.0 32804.0 28980.0
33 other financial assets – at fair value through... NaN NaN NaN 190759.0 88832.0 57665.0 39602.0 66394.0 70082.0
34 real estate and real estate funds NaN NaN NaN NaN 1968702.0 1792919.0 1673958.0 1299258.0 1094563.0
合并是用outer做的,因为有些职位每年都没有出现,我想确保从左到右收集每一行table:
df_assets_1 = df2021_assets.merge(
df2020_assets, how='outer', on='ITEM').merge(
df2019_assets, how='outer', on='ITEM').merge(
df2018_assets, how='outer', on='ITEM').merge(
df2017_assets, how='outer', on='ITEM').merge(
df2016_assets, how='outer', on='ITEM').merge(
df2015_assets, how='outer', on='ITEM').merge(
df2014_assets, how='outer', on='ITEM').merge(
df2013_assets, how='outer', on='ITEM')
我尝试通过
修复它- 确保删除项目描述并全部小写
- 在 axis=1 上使用 .concat 而不是 .merge
- 使用左连接,但会从右侧切掉位置 table(如预期)
- 在没有 on='ITEM' 的情况下尝试,但这没有任何区别
非常感谢任何帮助 - 非常感谢您抽出宝贵的时间!
我检查了在 PowerQuery 中使用 M 加入的过程,几乎每个步骤后都需要清理数据,因为由于财务结果的性质(可能有一个总体 'title-row'(例如股东价值)和另一行具有相同名称的股东价值总和)。
因此,在每次外部合并之后,必须删除可能的重复项,然后才能继续进行下一次外部合并。这导致了重复。