如何使用 Python 在 Excel 中突出显示不匹配的行并更新标记?

How to highlight unmatch row and update marking' in Excel using Python?

大家自学程序有点问题

  1. 我有两个不同的excel要比较...

Data1.xlsx

|  Name   |  Reg Date  |
|Annie    | 2021-07-01 |
|Billy    | 2021-07-02 |
|Cathrine | 2021-07-03 |
|David    | 2021-07-04 |
|Eric     | 2021-07-04 |

Data2.xlsx

|  Name   |   City    |  Reg Date  | Gender | Data1.xlsx |
|Alex     | Hong Kong | 2021-07-04 | Male   |            |
|Annie    | Hong Kong | 2021-07-01 | Female |            |
|Bob      | Taipei    | 2021-07-02 | Male   |            |
|Lucy     | Tokyo     | 2021-07-01 | Female |            |
|David    | London    | 2021-07-04 | Male   |            |
|Kate     | New York  | 2021-07-03 | Female |            |
|Cathrine | London    | 2021-07-03 | Female |            |
|Rose     | Hong Kong | 2021-07-04 | Female |            |
  1. 我得到 'Name' & 'Reg Date' 用于合并密钥

    import pandas as pd 
    dt1 = pd.read_excel('Data1.xlsx')
    dt2 = pd.read_excel('Data2.xlsx')
    df_merge = pd.merge(dt1.iloc[:, [0, 1]], dt2.iloc[:, [0, 2]], on=['Name', 'Reg Date'], how='outer', indicator=True)
    
    i = 0
    rows_to_color = []
    
    for a in df_merge.iloc[:, [2]].values:
        if a == 'both':
           rows_to_color.append(i)
        i += 1
    
    
    |  Name   |  Reg Date  |   _merge   |
    |Alex     | 2021-07-04 | right_only |
    |Annie    | 2021-07-01 | both       |
    |Billy    | 2021-07-02 | left_only  | 
    |Bob      | 2021-07-02 | right_only |
    |Lucy     | 2021-07-01 | right_only |
    |David    | 2021-07-04 | both       |
    |Eric     | 2021-07-04 | left_only  |
    |Kate     | 2021-07-03 | right_only |
    |Cathrine | 2021-07-03 | both       |
    |Rose     | 2021-07-04 | right_only |
    
  2. 我尝试编码以针对 'Data2.xlsx' 突出显示 'left_only' 和 'right_only',但不起作用。

    def bg_color(col):
    color = '#ffffff'
    return 'background-color: %s' % color
    if i in rows_to_color:
        for i, x in col.iteritems():
            styled = df_merge.style.apply(bg_color)
    
  3. 我不知道如何在'Data2.xlsx'中突出显示不匹配的行并标记'Y/N',下图是我的预期结果。你介意教我如何编码吗?

    enter image description here

merge 中使用左连接并先将 numpy.where 设置为 Y/N

#change order dt2, dt1
df_merge = pd.merge(dt2, 
                    dt1[['Name', 'Reg Date']], 
                    on=['Name', 'Reg Date'], 
                    how='left', indicator=True)
df_merge['Data1.xlsx'] = np.where(df_merge.pop('_merge').eq('both'), 'Y', 'N')
print (df_merge)
       Name       City    Reg Date  Gender Data1.xlsx
0      Alex  Hong Kong  2021-07-04    Male          N
1     Annie  Hong Kong  2021-07-01  Female          Y
2       Bob     Taipei  2021-07-02    Male          N
3      Lucy      Tokyo  2021-07-01  Female          N
4     David     London  2021-07-04    Male          Y
5      Kate   New York  2021-07-03  Female          N
6  Cathrine     London  2021-07-03  Female          Y
7      Rose  Hong Kong  2021-07-04  Female          N

然后按 N 行设置颜色:

def bg_color(x):
    c = 'background-color: yellow'
    # condition
    m = x["Data1.xlsx"].eq('N')
    # DataFrame of styles
    df1 = pd.DataFrame('', index=x.index, columns=x.columns)

    # set columns by condition
    return df1.mask(m, c)

styled = df_merge.style.apply(bg_color, axis=None)

styled.to_excel('styled.xlsx', engine='openpyxl', index=False)