如果不完全相同,则以不同颜色突出显示单元格
Highlight the cells in different color if not exact dup
我有这个数据框。
如果 描述 相同,则职位条目应该完全相同。
mycol = ['Title', 'Location', 'Company', 'Salary', 'Sponsored', 'Description']
mylist=[('a', 'b', 'c', 'd', 'e', 'f'),
('a', 'b', 'c', 'd2', 'e', 'f'),
('g', 'h', 'i', 'j', 'k', 'l' ),
('g1', 'h', 'i', 'j', 'k', 'l' ),
('n', 'o', 'p', 'q', 'r', 's'),
('n1', 'o', 'p', 'q', 'r', 's')
]
df = pd.DataFrame(mylist, columns = mycol)
我想突出显示黄色背景中的差异,如图所示...
pandas可以吗?
或者我可以在 excel 中导出并使用 VBA 进行处理。我试图在 pandas 中实现这一点,然后导出到 excel 以及格式化。
更新:
有人建议使用这个:
# Select all Columns but Description
cols = df.columns.symmetric_difference(['Description'])
# Clear All columns where Description is duplicated
df.loc[df['Description'].duplicated(), cols] = np.nan
# Fill foward over the blanks
df = df.ffill()
但它会替换值而不突出显示它。
我们可以清除描述为 duplicated
, then use groupby ffill
的行以根据描述向前填充值:
mask = df.copy(deep=True)
# Select all Columns but Description
cols = mask.columns.symmetric_difference(['Description'])
# Clear All columns where Description is duplicated
mask.loc[mask['Description'].duplicated(), cols] = np.nan
# Fill foward over the blanks
mask = mask.groupby(df['Description'].values).ffill()
mask
:
Title Location Company Salary Sponsored Description
0 a b c d e f
1 a b c d e f
2 g h i j k l
3 g h i j k l
4 n o p q r s
5 n o p q r s
这可以成为我们比较的点:
styles = (
# Remove Where values are incorrect
mask.where(mask.ne(df))
# Back fill per group
.groupby(df['Description'].values).bfill()
# Anywhere values are not null
.notnull()
# Replace booleans with styling
.replace({True: 'background-color: yellow;', False: ''})
)
df.style.apply(lambda _: styles, axis=None)
where
and groupby bfill
给我们:
mask.where(mask.ne(df)).groupby(df['Description'].values).bfill()
Title Location Company Salary Sponsored Description
0 NaN NaN NaN d NaN NaN
1 NaN NaN NaN d NaN NaN
2 g NaN NaN NaN NaN NaN
3 g NaN NaN NaN NaN NaN
4 n NaN NaN NaN NaN NaN
5 n NaN NaN NaN NaN NaN
然后notnull
and replace
允许设置样式:
styles
:
Title Location Company Salary Sponsored Description
0 background-color: yellow;
1 background-color: yellow;
2 background-color: yellow;
3 background-color: yellow;
4 background-color: yellow;
5 background-color: yellow;
记得从 Styler 对象而不是 DataFrame 写入 to_excel
:
df.style.apply(lambda _: styles, axis=None).to_excel('out.xlsx')
有人提出了这个答案。
mask = df.copy(deep=True)
# Select all Columns but Description
cols = mask.columns.symmetric_difference(["Description"])
# Clear All columns where Description is duplicated
mask.loc[mask["Description"].duplicated(), cols] = np.nan
# Fill foward over the blanks
mask = mask.groupby(df["Description"].values).ffill()
使用掩码数据框与原始数据框进行比较,然后应用样式。
styles = (
# Remove Where values are incorrect
mask.where(mask.ne(df))
# Back fill per group
.groupby(df["Description"].values).bfill()
# Anywhere values are not null
.notnull()
# Replace booleans with styling
.replace({True: "background-color: yellow;", False: ""})
)
df.style.apply(lambda _: styles, axis=None)
这按预期正常工作。
我有这个数据框。 如果 描述 相同,则职位条目应该完全相同。
mycol = ['Title', 'Location', 'Company', 'Salary', 'Sponsored', 'Description']
mylist=[('a', 'b', 'c', 'd', 'e', 'f'),
('a', 'b', 'c', 'd2', 'e', 'f'),
('g', 'h', 'i', 'j', 'k', 'l' ),
('g1', 'h', 'i', 'j', 'k', 'l' ),
('n', 'o', 'p', 'q', 'r', 's'),
('n1', 'o', 'p', 'q', 'r', 's')
]
df = pd.DataFrame(mylist, columns = mycol)
我想突出显示黄色背景中的差异,如图所示...
pandas可以吗?
或者我可以在 excel 中导出并使用 VBA 进行处理。我试图在 pandas 中实现这一点,然后导出到 excel 以及格式化。
更新:
有人建议使用这个:
# Select all Columns but Description
cols = df.columns.symmetric_difference(['Description'])
# Clear All columns where Description is duplicated
df.loc[df['Description'].duplicated(), cols] = np.nan
# Fill foward over the blanks
df = df.ffill()
但它会替换值而不突出显示它。
我们可以清除描述为 duplicated
, then use groupby ffill
的行以根据描述向前填充值:
mask = df.copy(deep=True)
# Select all Columns but Description
cols = mask.columns.symmetric_difference(['Description'])
# Clear All columns where Description is duplicated
mask.loc[mask['Description'].duplicated(), cols] = np.nan
# Fill foward over the blanks
mask = mask.groupby(df['Description'].values).ffill()
mask
:
Title Location Company Salary Sponsored Description
0 a b c d e f
1 a b c d e f
2 g h i j k l
3 g h i j k l
4 n o p q r s
5 n o p q r s
这可以成为我们比较的点:
styles = (
# Remove Where values are incorrect
mask.where(mask.ne(df))
# Back fill per group
.groupby(df['Description'].values).bfill()
# Anywhere values are not null
.notnull()
# Replace booleans with styling
.replace({True: 'background-color: yellow;', False: ''})
)
df.style.apply(lambda _: styles, axis=None)
where
and groupby bfill
给我们:
mask.where(mask.ne(df)).groupby(df['Description'].values).bfill()
Title Location Company Salary Sponsored Description
0 NaN NaN NaN d NaN NaN
1 NaN NaN NaN d NaN NaN
2 g NaN NaN NaN NaN NaN
3 g NaN NaN NaN NaN NaN
4 n NaN NaN NaN NaN NaN
5 n NaN NaN NaN NaN NaN
然后notnull
and replace
允许设置样式:
styles
:
Title Location Company Salary Sponsored Description
0 background-color: yellow;
1 background-color: yellow;
2 background-color: yellow;
3 background-color: yellow;
4 background-color: yellow;
5 background-color: yellow;
记得从 Styler 对象而不是 DataFrame 写入 to_excel
:
df.style.apply(lambda _: styles, axis=None).to_excel('out.xlsx')
有人提出了这个答案。
mask = df.copy(deep=True)
# Select all Columns but Description
cols = mask.columns.symmetric_difference(["Description"])
# Clear All columns where Description is duplicated
mask.loc[mask["Description"].duplicated(), cols] = np.nan
# Fill foward over the blanks
mask = mask.groupby(df["Description"].values).ffill()
使用掩码数据框与原始数据框进行比较,然后应用样式。
styles = (
# Remove Where values are incorrect
mask.where(mask.ne(df))
# Back fill per group
.groupby(df["Description"].values).bfill()
# Anywhere values are not null
.notnull()
# Replace booleans with styling
.replace({True: "background-color: yellow;", False: ""})
)
df.style.apply(lambda _: styles, axis=None)
这按预期正常工作。