Python:在一列中查找最近的日期,而在另一列中没有匹配的日期
Python: find most recent date in one column with no matching date in another
我有两个日期列代表客户设施的进入和退出。
ID
entry_date
exit_date
original_entrydate
003246
2022-03-22
NaN
2012-10-01
003246
2015-07-24
2022-03-22
2012-10-01
003246
2012-10-01
2015-07-24
2012-10-01
003246
2001-02-02
2010-04-05
2001-02-02
对于 table 中 ID 的所有实例,我需要将 entry_date 与 exit_date 进行匹配,以找到表示不间断跨度开始的最近条目日期该 ID 在设施之间移动但未离开护理的时间,并且 return 它在一列中,original_entrydate。
在示例中,前三行的 original_entrydate 的值为 2012-10-01,因为 entry_date 与 exit_date 不匹配,表示分隔来自护理,日期显示持续了两年零几个月。如果该 ID 有其他记录,则该过程将重置并查找 original_entrydate 之前从护理中分离出来的任何记录,直至下一次分离。
我以可以想象到的最笨拙的方式解决了我的问题——通过创建嵌套的 if-else 语句:
res_phys_levels['ORIGINAL_AdmDt']=''
for i in range(0, len(res_phys_levels)):
start_ID = res_phys_levels.iloc[i]['Individual_ID']
aDate = res_phys_levels.iloc[i]['Admit_Date']
id_count = res_phys_levels.Individual_ID.value_counts()[start_ID]
if id_count == 1: #if there's only one instance of Individual ID in table, then ORIGINAL_AdmDt = Admit_Date
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else: #if there's more than one instance of Individual_ID, then--
j = i+1
next_ID = res_phys_levels.iloc[j]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
sDate = res_phys_levels.iloc[j]['SEPARATION_DATE']
if aDate != sDate:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
aDate = res_phys_levels.iloc[j]['Admit_Date']
k = j+1
next_ID = res_phys_levels.iloc[k]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
sDate = res_phys_levels.iloc[k]['SEPARATION_DATE']
if aDate != sDate:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
aDate = res_phys_levels.iloc[k]['Admit_Date']
m = k+1
next_ID = res_phys_levels.iloc[m]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
sDate = res_phys_levels.iloc[m]['SEPARATION_DATE']
if aDate != sDate:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
aDate = res_phys_levels.iloc[k]['Admit_Date']
n = m+1
next_ID = res_phys_levels.iloc[n]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
这有两个原因:数据帧按 'Individual_ID' 升序和 'Admit_Date' 降序排序——嵌套的 if-else 语句允许比较 'Individual_ID'在 index[i] 和后续行,直到用尽所有可能性。
我也知道每个 ID 最多 4 行。
但是——请告诉我一个更好的、更 pythonic 的方法!
我有两个日期列代表客户设施的进入和退出。
ID | entry_date | exit_date | original_entrydate |
---|---|---|---|
003246 | 2022-03-22 | NaN | 2012-10-01 |
003246 | 2015-07-24 | 2022-03-22 | 2012-10-01 |
003246 | 2012-10-01 | 2015-07-24 | 2012-10-01 |
003246 | 2001-02-02 | 2010-04-05 | 2001-02-02 |
对于 table 中 ID 的所有实例,我需要将 entry_date 与 exit_date 进行匹配,以找到表示不间断跨度开始的最近条目日期该 ID 在设施之间移动但未离开护理的时间,并且 return 它在一列中,original_entrydate。
在示例中,前三行的 original_entrydate 的值为 2012-10-01,因为 entry_date 与 exit_date 不匹配,表示分隔来自护理,日期显示持续了两年零几个月。如果该 ID 有其他记录,则该过程将重置并查找 original_entrydate 之前从护理中分离出来的任何记录,直至下一次分离。
我以可以想象到的最笨拙的方式解决了我的问题——通过创建嵌套的 if-else 语句:
res_phys_levels['ORIGINAL_AdmDt']=''
for i in range(0, len(res_phys_levels)):
start_ID = res_phys_levels.iloc[i]['Individual_ID']
aDate = res_phys_levels.iloc[i]['Admit_Date']
id_count = res_phys_levels.Individual_ID.value_counts()[start_ID]
if id_count == 1: #if there's only one instance of Individual ID in table, then ORIGINAL_AdmDt = Admit_Date
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else: #if there's more than one instance of Individual_ID, then--
j = i+1
next_ID = res_phys_levels.iloc[j]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
sDate = res_phys_levels.iloc[j]['SEPARATION_DATE']
if aDate != sDate:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
aDate = res_phys_levels.iloc[j]['Admit_Date']
k = j+1
next_ID = res_phys_levels.iloc[k]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
sDate = res_phys_levels.iloc[k]['SEPARATION_DATE']
if aDate != sDate:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
aDate = res_phys_levels.iloc[k]['Admit_Date']
m = k+1
next_ID = res_phys_levels.iloc[m]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
sDate = res_phys_levels.iloc[m]['SEPARATION_DATE']
if aDate != sDate:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
else:
aDate = res_phys_levels.iloc[k]['Admit_Date']
n = m+1
next_ID = res_phys_levels.iloc[n]['Individual_ID']
if start_ID != next_ID:
res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
这有两个原因:数据帧按 'Individual_ID' 升序和 'Admit_Date' 降序排序——嵌套的 if-else 语句允许比较 'Individual_ID'在 index[i] 和后续行,直到用尽所有可能性。
我也知道每个 ID 最多 4 行。
但是——请告诉我一个更好的、更 pythonic 的方法!