如果找到某些值,则突出显示 python pandas 数据框
Highlight python pandas dataframe if certain values are found
它来自 'Pandas Cookbook' 第 7 章的最后一个示例,使用 flight.csv 数据集。 objective 是为每个航空公司和始发机场组合找到最长的延误时间。我在自己的基础上稍作修改。
def max_delay_streak(df):
df = df.reset_index(drop=True)
s = 1- df['ON_TIME']
s1 = s.cumsum()
streak = s.mul(s1).diff().where(lambda x: x < 0).ffill().add(s1, fill_value =0)
df['streak'] = streak
last_idx = streak.idxmax()
max_streak = streak.max()
# my slight modification here to accommodate delay streak equals 0
if max_streak == 0:
first_idx = 0
else:
first_idx = last_idx - max_streak + 1
df_return = df.loc[[first_idx, last_idx],['MONTH','DAY']]
df_return['streak'] = max_streak
df_return.index = ['first','last']
df_return.index.name = 'streak_row'
# search and operate zero streak
# my adjustment to find index where there is no delay streak
# df_return[df_return['streak'] == 0].index
# gets the MultiIndex([('EV', 'PHX', 'first'), ('EV', 'PHX', 'last')],
# names=['AIRLINE', 'ORG_AIR', 'streak_row'])
no_streak = df_return[df_return['streak'] == 0].index
# get the data from respective index and return month/day into '-'
df_return.loc[no_streak,['MONTH','DAY']] = '-'
return df_return
flights.sort_values(['MONTH','DAY','SCHED_DEP']).groupby(['AIRLINE','ORG_AIR']).apply(max_delay_streak)
代码在这里运行正常。接下来,我尝试用黄色突出显示延迟条纹为 0(或任何其他数字)的行。
desired_result
我尝试了 2 种方法,程序运行没有错误,并且生成了没有突出显示任何内容的原始数据帧。
方法一:复用上面程序最后一行的.loc逻辑,利用索引进入特定行添加颜色。
df_return.loc[no_streak].style.apply('background-color: yellow',axis=1)
方法二:丑陋的方法。我试图提取所有(航空公司、始发机场、first/last)索引,根据零延迟条纹索引检查它们,其中信息存储在变量 'no_streak' 中(在这种情况下('EV' , 'PHX', 'first'), ('EV', 'PHX', 'last'))。如果满足条件,则应用颜色。
df_return.style.apply(['background-color: yellow' for x in list(df_return.index) if x in list(no_streak)], axis=1)
为什么我的代码获取不到想要的图片?
有没有可能实现目标?
在 max_delay_streak() 函数之外执行样式设置。
import pandas as pd
flights = pd.read_csv('flights.csv')
flights['ON_TIME'] = flights['ARR_DELAY'].lt(15).astype(int)
flights_agg = flights.sort_values(['MONTH', 'DAY', 'SCHED_DEP']).groupby(['AIRLINE', 'ORG_AIR']).apply(max_delay_streak)
flights_agg.style.apply(lambda x: ['background-color: yellow']*3 if x.streak == 0 else ['background-color: default']*3, axis=1)
其中 max_delay_streak() 是问题中定义的函数。
虽然代码不够优雅,但总算是如愿以偿了。输入一个数字,dataframe 会高亮显示符合搜索条件的所有相应行,结果如下图。enter image description here
df = flights.sort_values(['MONTH','DAY','SCHED_DEP']).groupby(['AIRLINE','ORG_AIR']).apply(max_delay_streak)
streak_no = input("Enter streak no: ")
streak_no = int(streak_no)
color_dict = {"AA": "lightcoral", "AS": "orangered", "B6": "orange", "DL": "yellow" , "EV": "lawngreen", "F9": "palegreen", "HA": "lightcyan", "MQ": "aqua",
"NK": "skyblue", "OO": "lightsteelblue", "UA": 'lavender', "US": "violet", "VX": "magenta", "WN": "pink"}
# first level coloring
# get the first level index value
first_level_index = df.index.get_level_values(0)
# get unique first level value where criteria is met
no_streak_row_unique = df[df['streak'] == streak_no].index.get_level_values(0).unique()
# decide which row in first level to color
first_level_color_arrangement = [ {'selector': f'.row{i}.level0',
'props': [('background-color', color_dict[j])]} if j in no_streak_row_unique else {'selector': f'.row{i}.level0',
'props': [('background-color', 'default')]}for i,j in enumerate(first_level_index)]
# second level unique
second_level_index = list(zip(df.index.get_level_values(0) , df.index.get_level_values(1)))
# no_streak_row_2_unique
no_streak_row_2_unique = list(set(zip(df[df['streak'] == streak_no].index.get_level_values(0), df[df['streak'] == streak_no].index.get_level_values(1))))
second_level_color_arrangement = [ {'selector': f'.row{i}',
'props': [('background-color', color_dict[j[0]])]} if j in no_streak_row_2_unique else {'selector': f'.row{i}',
'props': [('background-color', 'default')]}for i,j in enumerate(second_level_index)]
df.style.set_table_styles(first_level_color_arrangement + second_level_color_arrangement)
它来自 'Pandas Cookbook' 第 7 章的最后一个示例,使用 flight.csv 数据集。 objective 是为每个航空公司和始发机场组合找到最长的延误时间。我在自己的基础上稍作修改。
def max_delay_streak(df):
df = df.reset_index(drop=True)
s = 1- df['ON_TIME']
s1 = s.cumsum()
streak = s.mul(s1).diff().where(lambda x: x < 0).ffill().add(s1, fill_value =0)
df['streak'] = streak
last_idx = streak.idxmax()
max_streak = streak.max()
# my slight modification here to accommodate delay streak equals 0
if max_streak == 0:
first_idx = 0
else:
first_idx = last_idx - max_streak + 1
df_return = df.loc[[first_idx, last_idx],['MONTH','DAY']]
df_return['streak'] = max_streak
df_return.index = ['first','last']
df_return.index.name = 'streak_row'
# search and operate zero streak
# my adjustment to find index where there is no delay streak
# df_return[df_return['streak'] == 0].index
# gets the MultiIndex([('EV', 'PHX', 'first'), ('EV', 'PHX', 'last')],
# names=['AIRLINE', 'ORG_AIR', 'streak_row'])
no_streak = df_return[df_return['streak'] == 0].index
# get the data from respective index and return month/day into '-'
df_return.loc[no_streak,['MONTH','DAY']] = '-'
return df_return
flights.sort_values(['MONTH','DAY','SCHED_DEP']).groupby(['AIRLINE','ORG_AIR']).apply(max_delay_streak)
代码在这里运行正常。接下来,我尝试用黄色突出显示延迟条纹为 0(或任何其他数字)的行。
desired_result
我尝试了 2 种方法,程序运行没有错误,并且生成了没有突出显示任何内容的原始数据帧。
方法一:复用上面程序最后一行的.loc逻辑,利用索引进入特定行添加颜色。
df_return.loc[no_streak].style.apply('background-color: yellow',axis=1)
方法二:丑陋的方法。我试图提取所有(航空公司、始发机场、first/last)索引,根据零延迟条纹索引检查它们,其中信息存储在变量 'no_streak' 中(在这种情况下('EV' , 'PHX', 'first'), ('EV', 'PHX', 'last'))。如果满足条件,则应用颜色。
df_return.style.apply(['background-color: yellow' for x in list(df_return.index) if x in list(no_streak)], axis=1)
为什么我的代码获取不到想要的图片? 有没有可能实现目标?
在 max_delay_streak() 函数之外执行样式设置。
import pandas as pd
flights = pd.read_csv('flights.csv')
flights['ON_TIME'] = flights['ARR_DELAY'].lt(15).astype(int)
flights_agg = flights.sort_values(['MONTH', 'DAY', 'SCHED_DEP']).groupby(['AIRLINE', 'ORG_AIR']).apply(max_delay_streak)
flights_agg.style.apply(lambda x: ['background-color: yellow']*3 if x.streak == 0 else ['background-color: default']*3, axis=1)
其中 max_delay_streak() 是问题中定义的函数。
虽然代码不够优雅,但总算是如愿以偿了。输入一个数字,dataframe 会高亮显示符合搜索条件的所有相应行,结果如下图。enter image description here
df = flights.sort_values(['MONTH','DAY','SCHED_DEP']).groupby(['AIRLINE','ORG_AIR']).apply(max_delay_streak)
streak_no = input("Enter streak no: ")
streak_no = int(streak_no)
color_dict = {"AA": "lightcoral", "AS": "orangered", "B6": "orange", "DL": "yellow" , "EV": "lawngreen", "F9": "palegreen", "HA": "lightcyan", "MQ": "aqua",
"NK": "skyblue", "OO": "lightsteelblue", "UA": 'lavender', "US": "violet", "VX": "magenta", "WN": "pink"}
# first level coloring
# get the first level index value
first_level_index = df.index.get_level_values(0)
# get unique first level value where criteria is met
no_streak_row_unique = df[df['streak'] == streak_no].index.get_level_values(0).unique()
# decide which row in first level to color
first_level_color_arrangement = [ {'selector': f'.row{i}.level0',
'props': [('background-color', color_dict[j])]} if j in no_streak_row_unique else {'selector': f'.row{i}.level0',
'props': [('background-color', 'default')]}for i,j in enumerate(first_level_index)]
# second level unique
second_level_index = list(zip(df.index.get_level_values(0) , df.index.get_level_values(1)))
# no_streak_row_2_unique
no_streak_row_2_unique = list(set(zip(df[df['streak'] == streak_no].index.get_level_values(0), df[df['streak'] == streak_no].index.get_level_values(1))))
second_level_color_arrangement = [ {'selector': f'.row{i}',
'props': [('background-color', color_dict[j[0]])]} if j in no_streak_row_2_unique else {'selector': f'.row{i}',
'props': [('background-color', 'default')]}for i,j in enumerate(second_level_index)]
df.style.set_table_styles(first_level_color_arrangement + second_level_color_arrangement)