如果不完全相同，则以不同颜色突出显示单元格

Question

我有这个数据框。如果描述相同，则职位条目应该完全相同。

mycol = ['Title', 'Location', 'Company', 'Salary', 'Sponsored', 'Description']
mylist=[('a', 'b', 'c', 'd', 'e', 'f'),
('a', 'b', 'c', 'd2', 'e', 'f'),
('g', 'h', 'i', 'j', 'k', 'l' ),
('g1', 'h', 'i', 'j', 'k', 'l' ),
('n', 'o', 'p', 'q', 'r', 's'),
('n1', 'o', 'p', 'q', 'r', 's')
]

df = pd.DataFrame(mylist, columns = mycol)

我想突出显示黄色背景中的差异，如图所示...

pandas可以吗？

或者我可以在 excel 中导出并使用 VBA 进行处理。我试图在 pandas 中实现这一点，然后导出到 excel 以及格式化。

更新：

有人建议使用这个：

# Select all Columns but Description
cols = df.columns.symmetric_difference(['Description'])
# Clear All columns where Description is duplicated
df.loc[df['Description'].duplicated(), cols] = np.nan
# Fill foward over the blanks
df = df.ffill()

但它会替换值而不突出显示它。

Answer 1

我们可以清除描述为 duplicated, then use groupby ffill 的行以根据描述向前填充值：

mask = df.copy(deep=True)
# Select all Columns but Description
cols = mask.columns.symmetric_difference(['Description'])
# Clear All columns where Description is duplicated
mask.loc[mask['Description'].duplicated(), cols] = np.nan
# Fill foward over the blanks
mask = mask.groupby(df['Description'].values).ffill()

mask:

  Title Location Company Salary Sponsored Description
0     a        b       c      d         e           f
1     a        b       c      d         e           f
2     g        h       i      j         k           l
3     g        h       i      j         k           l
4     n        o       p      q         r           s
5     n        o       p      q         r           s

这可以成为我们比较的点：

styles = (
    # Remove Where values are incorrect
    mask.where(mask.ne(df))
        # Back fill per group
        .groupby(df['Description'].values).bfill()
        # Anywhere values are not null
        .notnull()
        # Replace booleans with styling
        .replace({True: 'background-color: yellow;', False: ''})
)

df.style.apply(lambda _: styles, axis=None)

where and groupby bfill 给我们：

mask.where(mask.ne(df)).groupby(df['Description'].values).bfill()

  Title Location Company Salary Sponsored Description
0   NaN      NaN     NaN      d       NaN         NaN
1   NaN      NaN     NaN      d       NaN         NaN
2     g      NaN     NaN    NaN       NaN         NaN
3     g      NaN     NaN    NaN       NaN         NaN
4     n      NaN     NaN    NaN       NaN         NaN
5     n      NaN     NaN    NaN       NaN         NaN

然后notnull and replace允许设置样式： styles:

                       Title Location Company                     Salary Sponsored Description
0                                              background-color: yellow;                      
1                                              background-color: yellow;                      
2  background-color: yellow;                                                                  
3  background-color: yellow;                                                                  
4  background-color: yellow;                                                                  
5  background-color: yellow;

记得从 Styler 对象而不是 DataFrame 写入 to_excel：

df.style.apply(lambda _: styles, axis=None).to_excel('out.xlsx')

Answer 2

有人提出了这个答案。

mask = df.copy(deep=True)
# Select all Columns but Description
cols = mask.columns.symmetric_difference(["Description"])
# Clear All columns where Description is duplicated
mask.loc[mask["Description"].duplicated(), cols] = np.nan
# Fill foward over the blanks
mask = mask.groupby(df["Description"].values).ffill()

使用掩码数据框与原始数据框进行比较，然后应用样式。

styles = (
    # Remove Where values are incorrect
    mask.where(mask.ne(df))
    # Back fill per group
    .groupby(df["Description"].values).bfill()
    # Anywhere values are not null
    .notnull()
    # Replace booleans with styling
    .replace({True: "background-color: yellow;", False: ""})
)

df.style.apply(lambda _: styles, axis=None)

这按预期正常工作。

如果不完全相同，则以不同颜色突出显示单元格

Highlight the cells in different color if not exact dup

pandas

pandas-styles