Pandas 使用条件规则设置样式
Pandas Styling with Conditional Rules
我试图使用 2 个不同的列来设置 Pandas 数据框的样式。
只要条件是关于列本身,我就成功了,但是当它依赖于另一个时,我无法获得所需的结果。
如果“日期 I”已经过去,我想为“日期 II”中的单元格着色。
def date_pii(row):
ret = ["" for _ in row.index]
print(row['Date PI'])
if row['Date PI'] < datetime.now():
ret[row.index.get_loc("Date PII")] = "background-color: red"
return ret
styler = df3.style \
.applymap(lambda x: 'background-color: %s' % 'red' if x <= datetime.now() else '', subset=['Date PI']) \
.applymap(lambda x: 'background-color: %s' % 'yellow' if x < datetime.now() + timedelta(days=30) else '',
subset=['Date PII']) \
.applymap(lambda x: 'background-color: %s' % 'orange' if x <= datetime.now() else '', subset=['Date PII']) \
.applymap(lambda x: 'background-color: %s' % 'grey' if pd.isnull(x) else '', subset=['Date PI'])\
.applymap(lambda x: 'background-color: %s' % 'grey' if pd.isnull(x) else '', subset=['Date PII'])\
.apply(date_pii, axis=1) ----> THIS IS THE ISSUE
styler.to_excel(writer, sheet_name='Report Paris', index=False)
在运行时出现以下错误:
ValueError: Function <function generate_report_all.<locals>.date_pii at 0x7fd3964d9160> returned the wrong shape.
Result has shape: (532,)
Expected shape: (532, 10)
数据框如下所示:
“日期 PII”中的第一个橙色单元格是正确的,但是,其余的(PI 为红色的地方)我希望它们也变成红色。
感谢您的帮助!
解决此类问题的一般方法是将指定的列作为 subset
到 Styler.apply
. This allows us to create styles at the DataFrame level and use loc
索引传递,以根据条件构建样式。另一个主要好处是,我们可以使用额外的 space 来提供文档,同时减少所有这些 lambda 的开销:
而不是链接
def style_dates(subset_df):
# Empty Styles
style_df = pd.DataFrame(
'', index=subset_df.index, columns=subset_df.columns
)
# Today's Date
today = pd.Timestamp.now().normalize()
# Date PII is within 30 days from today
style_df.loc[
subset_df['Date PII'].le(today + pd.Timedelta(days=30)),
'Date PII'
] = 'background-color: yellow'
# Date PI is before today
style_df.loc[
subset_df['Date PI'].lt(today),
['Date PI', 'Date PII']
] = 'background-color: red'
# Date PII is before today and Date PI is after Today
style_df.loc[
subset_df['Date PII'].lt(today) & subset_df['Date PI'].gt(today),
'Date PII'
] = 'background-color: orange'
# Either is NaN
style_df[subset_df.isna()] = 'background-color: gray'
return style_df
styler = df3.style.apply(
style_dates, axis=None, subset=['Date PII', 'Date PI']
).format(
# Optional Explicit Date Format
formatter='{:%Y-%m-%d}', na_rep='NaT', subset=['Date PII', 'Date PI']
)
设置始终相对于当前日期随机生成的 DataFrame(样式将保持一致,而日期将不一致):
import numpy as np
import pandas as pd
from numpy.random import Generator, MT19937
norm_today = pd.Timestamp.now().normalize()
rng = Generator(MT19937(1023))
def md(lower_bound, upper_bound, rng_=rng):
return pd.Timedelta(days=rng_.integers(lower_bound, upper_bound))
df3 = pd.DataFrame({
'Desc': [
'PII within 30 days', # PII yellow
'PII in past and PI in future', # PII orange
'PI past', # Both red
'PI empty', # grey
'PII empty', # grey
'PII in future but not within 30 days' # No Styles
],
'Date PII': [norm_today + md(1, 10), norm_today - md(1, 10),
norm_today, norm_today, np.nan,
norm_today + md(40, 50)],
'Date PI': [norm_today, norm_today + md(1, 10),
norm_today - md(1, 10), np.nan, norm_today,
norm_today]
})
Desc
Date PII
Date PI
PII within 30 days
2021-11-06 00:00:00
2021-11-03 00:00:00
PII in past and PI in future
2021-10-31 00:00:00
2021-11-11 00:00:00
PI past
2021-11-03 00:00:00
2021-11-01 00:00:00
PI empty
2021-11-03 00:00:00
NaT
PII empty
NaT
2021-11-03 00:00:00
PII in future but not within 30 days
2021-12-19 00:00:00
2021-11-03 00:00:00
虽然@HenryEcker 解决方案适用于 DataFrame 级别(注意他使用了 axis=None
关键字参数),但有时可能需要更简单的方法。
由于您的条件完全取决于行,因此您可以将 apply 与 axis=1
一起使用,并附加一个根据每行中的列值进行计算的函数。
例如:
df = DataFrame([[1,2,3],[3,2,1]], index=["i", "j"], columns=["A", "B", "C"])
A B C
i 1 2 3
j 3 2 1
假设我们要突出显示列 C
如果它小于列 A
:
def highlight(s):
if s["C"] < s["A"]
return ["", "color: red;"]
return ["", ""]
df.style.apply(highlight, subset=["A", "C"], axis=1)
我试图使用 2 个不同的列来设置 Pandas 数据框的样式。 只要条件是关于列本身,我就成功了,但是当它依赖于另一个时,我无法获得所需的结果。
如果“日期 I”已经过去,我想为“日期 II”中的单元格着色。
def date_pii(row):
ret = ["" for _ in row.index]
print(row['Date PI'])
if row['Date PI'] < datetime.now():
ret[row.index.get_loc("Date PII")] = "background-color: red"
return ret
styler = df3.style \
.applymap(lambda x: 'background-color: %s' % 'red' if x <= datetime.now() else '', subset=['Date PI']) \
.applymap(lambda x: 'background-color: %s' % 'yellow' if x < datetime.now() + timedelta(days=30) else '',
subset=['Date PII']) \
.applymap(lambda x: 'background-color: %s' % 'orange' if x <= datetime.now() else '', subset=['Date PII']) \
.applymap(lambda x: 'background-color: %s' % 'grey' if pd.isnull(x) else '', subset=['Date PI'])\
.applymap(lambda x: 'background-color: %s' % 'grey' if pd.isnull(x) else '', subset=['Date PII'])\
.apply(date_pii, axis=1) ----> THIS IS THE ISSUE
styler.to_excel(writer, sheet_name='Report Paris', index=False)
在运行时出现以下错误:
ValueError: Function <function generate_report_all.<locals>.date_pii at 0x7fd3964d9160> returned the wrong shape.
Result has shape: (532,)
Expected shape: (532, 10)
数据框如下所示:
“日期 PII”中的第一个橙色单元格是正确的,但是,其余的(PI 为红色的地方)我希望它们也变成红色。
感谢您的帮助!
解决此类问题的一般方法是将指定的列作为 subset
到 Styler.apply
. This allows us to create styles at the DataFrame level and use loc
索引传递,以根据条件构建样式。另一个主要好处是,我们可以使用额外的 space 来提供文档,同时减少所有这些 lambda 的开销:
def style_dates(subset_df):
# Empty Styles
style_df = pd.DataFrame(
'', index=subset_df.index, columns=subset_df.columns
)
# Today's Date
today = pd.Timestamp.now().normalize()
# Date PII is within 30 days from today
style_df.loc[
subset_df['Date PII'].le(today + pd.Timedelta(days=30)),
'Date PII'
] = 'background-color: yellow'
# Date PI is before today
style_df.loc[
subset_df['Date PI'].lt(today),
['Date PI', 'Date PII']
] = 'background-color: red'
# Date PII is before today and Date PI is after Today
style_df.loc[
subset_df['Date PII'].lt(today) & subset_df['Date PI'].gt(today),
'Date PII'
] = 'background-color: orange'
# Either is NaN
style_df[subset_df.isna()] = 'background-color: gray'
return style_df
styler = df3.style.apply(
style_dates, axis=None, subset=['Date PII', 'Date PI']
).format(
# Optional Explicit Date Format
formatter='{:%Y-%m-%d}', na_rep='NaT', subset=['Date PII', 'Date PI']
)
设置始终相对于当前日期随机生成的 DataFrame(样式将保持一致,而日期将不一致):
import numpy as np
import pandas as pd
from numpy.random import Generator, MT19937
norm_today = pd.Timestamp.now().normalize()
rng = Generator(MT19937(1023))
def md(lower_bound, upper_bound, rng_=rng):
return pd.Timedelta(days=rng_.integers(lower_bound, upper_bound))
df3 = pd.DataFrame({
'Desc': [
'PII within 30 days', # PII yellow
'PII in past and PI in future', # PII orange
'PI past', # Both red
'PI empty', # grey
'PII empty', # grey
'PII in future but not within 30 days' # No Styles
],
'Date PII': [norm_today + md(1, 10), norm_today - md(1, 10),
norm_today, norm_today, np.nan,
norm_today + md(40, 50)],
'Date PI': [norm_today, norm_today + md(1, 10),
norm_today - md(1, 10), np.nan, norm_today,
norm_today]
})
Desc | Date PII | Date PI |
---|---|---|
PII within 30 days | 2021-11-06 00:00:00 | 2021-11-03 00:00:00 |
PII in past and PI in future | 2021-10-31 00:00:00 | 2021-11-11 00:00:00 |
PI past | 2021-11-03 00:00:00 | 2021-11-01 00:00:00 |
PI empty | 2021-11-03 00:00:00 | NaT |
PII empty | NaT | 2021-11-03 00:00:00 |
PII in future but not within 30 days | 2021-12-19 00:00:00 | 2021-11-03 00:00:00 |
虽然@HenryEcker 解决方案适用于 DataFrame 级别(注意他使用了 axis=None
关键字参数),但有时可能需要更简单的方法。
由于您的条件完全取决于行,因此您可以将 apply 与 axis=1
一起使用,并附加一个根据每行中的列值进行计算的函数。
例如:
df = DataFrame([[1,2,3],[3,2,1]], index=["i", "j"], columns=["A", "B", "C"])
A B C
i 1 2 3
j 3 2 1
假设我们要突出显示列 C
如果它小于列 A
:
def highlight(s):
if s["C"] < s["A"]
return ["", "color: red;"]
return ["", ""]
df.style.apply(highlight, subset=["A", "C"], axis=1)