如何根据 python 数据框中的行条件查找最接近的列名？

Question

所有 ID 都包含 1-100 的排名。目的是了解日期期间的排名流程。

请帮忙！

1.I 如果值不为 0 但接近 1，则想从左边找到第一个日期。

2.I 想从右边找到最佳排名的最后日期。

3.I 想在所有日期中找到最好的排名（最小值）。

输入

import datetime
d = {'ID': ["id1","id2"], '26-01-2021': [0, 15],'01-02-2021': [12, 17],'01-03-2021': [58, 17]}
df = pd.DataFrame(data=d)

ID  26-01-2021  01-02-2021  01-03-2021
id1     0         12          58
id2     15        17          17

Desired_output

ID  26-01-2021  01-02-2021  01-03-2021  first_date_rank last_date_rank  best_latest_date_rank   best_rank
id1   0              12           58    01-02-2021        01-03-2021            01-02-2021      12
id2  15              17           17    26-01-2021        01-03-2021            26-01-2021      15

我试过 argmin 但它不起作用

def get_date(row):
    date_range = row[dft.columns[1:]]
    closest_value_key = abs(100 - date_range).argmin()
    closest_date = date_range[closest_value_key]
    column_name = date_range.keys()[closest_value_key]
    return pd.Series((closest_date, column_name))

dft[['best_latest_date_rank', 'best_rank']] = dft.apply(lambda row:get_date(row), axis=1)

请帮忙！

Answer 1

您可以在此处进行 idx max 和 idxmin 的一些更改：

u = df.set_index("ID").replace(0,np.nan)
first_date_rank = u.idxmin(1)
last_date_rank= u.iloc[:,::-1].idxmax(1)
best_rank = u.min(1)

out = u.assign(first_date_rank=first_date_rank, last_date_rank=last_date_rank,
         best_latest_date_rank=first_date_rank,best_rank=best_rank).reset_index()

print(out)

    ID  26-01-2021  01-02-2021  01-03-2021 first_date_rank last_date_rank  \
0  id1         NaN          12          58      01-02-2021     01-03-2021   
1  id2        15.0          17          17      26-01-2021     01-03-2021   

  best_latest_date_rank  best_rank  
0            01-02-2021       12.0  
1            26-01-2021       15.0

Answer 2

灵感来自@anky 最好和最坏的情况。获取数据框列中的所有非空值或非空值的索引。完整代码：

u = df.set_index("ID").replace(0,np.nan) #set it id as index if not
most_recent_best_date = u.iloc[:,::-1].idxmin(1) #u.iloc[:,::-1]arrange column in reverse order
worst_date_rank= u.iloc[:,::-1].idxmax(1)
worst_rank= u.iloc[:,::-1].max(1)
most_recent_best_rank = u.iloc[:,::-1].min(1)
Days_spent_in_top_100= u.notnull().sum(axis=1) #count total dates present in cols

out = u.assign(most_recent_best_date=most_recent_best_date, 
               worst_date_rank=worst_date_rank,
               worst_rank=worst_rank,
               most_recent_best_rank=most_recent_best_rank,
               Days_spent_in_top_100=Days_spent_in_top_100).reset_index()

#getting first non null value
out['first_day_rank']=out.iloc[:,1:].fillna(method='bfill', axis=1).iloc[:, 0] 
#getting last non null value in selected dataframe
out['last_day_rank']=out.iloc[:,1:31].fillna(method='ffill', axis=1).iloc[:, -1]
#getting index of first non null value 
out['first_entered_day']= out.iloc[:,1:].apply(pd.Series.first_valid_index, axis=1)

如何根据 python 数据框中的行条件查找最接近的列名？

How to find closest col name based on if row conditions in python dataframe?

python

user-defined-functions

dataframe

pandas

data-science