Python:在一列中查找最近的日期,而在另一列中没有匹配的日期

Python: find most recent date in one column with no matching date in another

我有两个日期列代表客户设施的进入和退出。

ID entry_date exit_date original_entrydate
003246 2022-03-22 NaN 2012-10-01
003246 2015-07-24 2022-03-22 2012-10-01
003246 2012-10-01 2015-07-24 2012-10-01
003246 2001-02-02 2010-04-05 2001-02-02

对于 table 中 ID 的所有实例,我需要将 entry_date 与 exit_date 进行匹配,以找到表示不间断跨度开始的最近条目日期该 ID 在设施之间移动但未离开护理的时间,并且 return 它在一列中,original_entrydate。

在示例中,前三行的 original_entrydate 的值为 2012-10-01,因为 entry_date 与 exit_date 不匹配,表示分隔来自护理,日期显示持续了两年零几个月。如果该 ID 有其他记录,则该过程将重置并查找 original_entrydate 之前从护理中分离出来的任何记录,直至下一次分离。

我以可以想象到的最笨拙的方式解决了我的问题——通过创建嵌套的 if-else 语句:

res_phys_levels['ORIGINAL_AdmDt']=''

for i in range(0, len(res_phys_levels)):
    start_ID = res_phys_levels.iloc[i]['Individual_ID']
    aDate = res_phys_levels.iloc[i]['Admit_Date']
    id_count = res_phys_levels.Individual_ID.value_counts()[start_ID]
    if id_count == 1:     #if there's only one instance of Individual ID in table, then ORIGINAL_AdmDt = Admit_Date
        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
    else:                 #if there's more than one instance of Individual_ID, then--
        j = i+1        
        next_ID = res_phys_levels.iloc[j]['Individual_ID']
        if start_ID != next_ID:
            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
        else: 
            sDate = res_phys_levels.iloc[j]['SEPARATION_DATE']
            if aDate != sDate: 
                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
            else: 
                aDate = res_phys_levels.iloc[j]['Admit_Date']
                k = j+1
                next_ID = res_phys_levels.iloc[k]['Individual_ID']
                if start_ID != next_ID:
                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                else: 
                    sDate = res_phys_levels.iloc[k]['SEPARATION_DATE']
                    if aDate != sDate: 
                        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                    else: 
                        aDate = res_phys_levels.iloc[k]['Admit_Date']
                        m = k+1
                        next_ID = res_phys_levels.iloc[m]['Individual_ID']
                        if start_ID != next_ID:
                            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                        else:
                            sDate = res_phys_levels.iloc[m]['SEPARATION_DATE']
                            if aDate != sDate:
                                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                            else:
                                aDate = res_phys_levels.iloc[k]['Admit_Date']
                                n = m+1
                                next_ID = res_phys_levels.iloc[n]['Individual_ID']
                                if start_ID != next_ID:
                                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate

这有两个原因:数据帧按 'Individual_ID' 升序和 'Admit_Date' 降序排序——嵌套的 if-else 语句允许比较 'Individual_ID'在 index[i] 和后续行,直到用尽所有可能性。

我也知道每个 ID 最多 4 行。

但是——请告诉我一个更好的、更 pythonic 的方法!