pandas concat 比较具有相同列名的 2 个数据帧如何构建不同的列?

pandas concat compare 2 dataframes with same column names how to build a differ column?

我正在使用 pandas concat 来比较具有相同列和行的 2 个数据帧:

import pandas as pd

df=pd.read_csv(r'C:\Users\compare\T1.csv')
df2=pd.read_csv(r'C:\Users\compare\T2.csv')

index=['contract','RB','payee','fund']

df_all = pd.concat([df.set_index(index), df2.set_index(index)], 
                   axis='columns', keys=['First', 'Second'])

df_final = df_all.swaplevel(axis='columns')[df.columns[54:56]]

df_final

输出为:

                              SD1                SD2
                             First     Second    First   Second
contract    RB  payee   fund                
AG72916Z    2   1   W42      15622.9    15622.9  15622.9    15489.2
            4   1   W44      14697.8    14697.8  14697.8    14572.1
            8   1   W48      7388.56    7388.56  7388.56    7325.37
AL0024AZ    C3  1   202      226.585    226.59   220.366    220.37
            S3  1   204      804.059    804.06   781.99     781.99

我的问题是如何在每一秒后添加一个不同的列,以便我可以轻松判断比较结果,输出应如下所示:

                              SD1                         SD2
                             First     Second   differ    First     Second   differ
contract    RB  payee   fund                
AG72916Z    2   1   W42      15622.9    15622.9    0       15622.9  15489.2  133.7
            4   1   W44      14697.8    14697.8    0       14697.8  14572.1  125.7
            8   1   W48      7388.56    7388.56    0       7388.56  7325.37  63.19
AL0024AZ    C3  1   202      226.585    226.59     0.05    220.366  220.37  -0.004
            S3  1   204      804.059    804.06     0.01    781.99   781.99   0

有点棘手但必须继续订购:

out = df_final.stack(level=0).assign(Diff=lambda x: x['First'] - x['Second']) \
              .stack().unstack(level=[-2, -1])
print(out)

# Output
                              SD1                         SD2                   
                            First    Second   Diff      First    Second     Diff
contract RB payee fund                                                          
AG72916Z 2  1     W42   15622.900  15622.90  0.000  15622.900  15489.20  133.700
         4  1     W44   14697.800  14697.80  0.000  14697.800  14572.10  125.700
         8  1     W48    7388.560   7388.56  0.000   7388.560   7325.37   63.190
AL0024AZ C3 1     202     226.585    226.59 -0.005    220.366    220.37   -0.004
         S3 1     204     804.059    804.06 -0.001    781.990    781.99    0.000

更新

What if I want to only select the row that Diff large than 100

使用:

>>> out[out.loc[:, (slice(None), 'Diff')].gt(100).any(1)]

                            SD1                    SD2                
                          First   Second Diff    First   Second   Diff
contract RB payee fund                                                
AG72916Z 2  1     W42   15622.9  15622.9  0.0  15622.9  15489.2  133.7
         4  1     W44   14697.8  14697.8  0.0  14697.8  14572.1  125.7

# Same result with
# idx = pd.IndexSlice
# out[out.loc[:, idx[:, 'Diff']].gt(100).any(1)]