Python：在一列中查找最近的日期，而在另一列中没有匹配的日期

Question

我有两个日期列代表客户设施的进入和退出。

ID	entry_date	exit_date	original_entrydate
003246	2022-03-22	NaN	2012-10-01
003246	2015-07-24	2022-03-22	2012-10-01
003246	2012-10-01	2015-07-24	2012-10-01
003246	2001-02-02	2010-04-05	2001-02-02

对于 table 中 ID 的所有实例，我需要将 entry_date 与 exit_date 进行匹配，以找到表示不间断跨度开始的最近条目日期该 ID 在设施之间移动但未离开护理的时间，并且 return 它在一列中，original_entrydate。

在示例中，前三行的 original_entrydate 的值为 2012-10-01，因为 entry_date 与 exit_date 不匹配，表示分隔来自护理，日期显示持续了两年零几个月。如果该 ID 有其他记录，则该过程将重置并查找 original_entrydate 之前从护理中分离出来的任何记录，直至下一次分离。

Answer 1

我以可以想象到的最笨拙的方式解决了我的问题——通过创建嵌套的 if-else 语句：

res_phys_levels['ORIGINAL_AdmDt']=''

for i in range(0, len(res_phys_levels)):
    start_ID = res_phys_levels.iloc[i]['Individual_ID']
    aDate = res_phys_levels.iloc[i]['Admit_Date']
    id_count = res_phys_levels.Individual_ID.value_counts()[start_ID]
    if id_count == 1:     #if there's only one instance of Individual ID in table, then ORIGINAL_AdmDt = Admit_Date
        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
    else:                 #if there's more than one instance of Individual_ID, then--
        j = i+1        
        next_ID = res_phys_levels.iloc[j]['Individual_ID']
        if start_ID != next_ID:
            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
        else: 
            sDate = res_phys_levels.iloc[j]['SEPARATION_DATE']
            if aDate != sDate: 
                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
            else: 
                aDate = res_phys_levels.iloc[j]['Admit_Date']
                k = j+1
                next_ID = res_phys_levels.iloc[k]['Individual_ID']
                if start_ID != next_ID:
                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                else: 
                    sDate = res_phys_levels.iloc[k]['SEPARATION_DATE']
                    if aDate != sDate: 
                        res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                    else: 
                        aDate = res_phys_levels.iloc[k]['Admit_Date']
                        m = k+1
                        next_ID = res_phys_levels.iloc[m]['Individual_ID']
                        if start_ID != next_ID:
                            res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                        else:
                            sDate = res_phys_levels.iloc[m]['SEPARATION_DATE']
                            if aDate != sDate:
                                res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate
                            else:
                                aDate = res_phys_levels.iloc[k]['Admit_Date']
                                n = m+1
                                next_ID = res_phys_levels.iloc[n]['Individual_ID']
                                if start_ID != next_ID:
                                    res_phys_levels.at[i, 'ORIGINAL_AdmDt'] = aDate

这有两个原因：数据帧按 'Individual_ID' 升序和 'Admit_Date' 降序排序——嵌套的 if-else 语句允许比较 'Individual_ID'在 index[i] 和后续行，直到用尽所有可能性。

我也知道每个 ID 最多 4 行。

但是——请告诉我一个更好的、更 pythonic 的方法！

Python：在一列中查找最近的日期，而在另一列中没有匹配的日期

Python: find most recent date in one column with no matching date in another

python

python-datetime