Pandas 按日期范围和条件过滤 df

Pandas filter df by date range and condition

我有一个包含 3 个日期时间列的数据框

              ItemUid   HireStart    DCompleteDate       OffHire
              14055     2021-01-01       2021-12-17      2021-01-09
              14065     2021-08-12       2021-12-17      2021-11-17
              14534     2018-12-21             NaT             NaT
              11639           NaT              NaT             NaT
              43268     2020-09-07       2020-09-03      2020-11-03
              36723     2021-01-03             Nat       2021-01-10
             

我正在尝试 return 一个数据框,其中 return 是在用户输入的日期范围内租用的项目。

即:如果用户输入:开始日期 = '2021-01-02' & 结束日期 = '2021-01-08' 预期结果将是:

          ItemUid   HireStart    DCompleteDate       OffHire
          14055     2021-01-01       2021-01-23      2021-01-09
          14534     2018-12-21             NaT             NaT
          36723     2021-01-03             Nat       2021-01-10
             

我的代码:)

def date_range(df):
    start_date = input("Enter start date dd/mm/yyyy: ")
    end_date = input("Enter end date dd/mm/yyyy: ")

    df = df[(df['OffHire'] <= end_date) & 
             ((df['HireStart'].notna()) | (df['HireStart'] >= start_date))]
    
    return df

result = df_hire.apply(date_range, axis=1)

当前出现错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-60-6d4d17020cba> in <module>()
      9     return df
     10 
---> 11 result = df_hire.apply(date_range, axis=1)

4 frames
<ipython-input-60-6d4d17020cba> in date_range(df)
      3     end_date = input("Enter end date dd/mm/yyyy: ")
      4 
----> 5     df = df[(df['OffHire'] <= end_date) & 
      6              ((df['HireStart'].notna()) | (df['HireStart'] >= start_date))]
      7 

TypeError: '<=' not supported between instances of 'Timestamp' and 'str'

我可能可以修复错误,但是如何应用该函数的实现让我卡住了!

任何帮助将不胜感激,这将是我的又一课!

提前致谢

IIUC,你想要这样的东西:

#convert the date columns to datetime
df["HireStart"] = pd.to_datetime(df["HireStart"])
df["DCompleteDate"] = pd.to_datetime(df["DCompleteDate"])
df["OffHire"] = pd.to_datetime(df["OffHire"])

#convert inputs to datetime
start_date = pd.to_datetime(start_date, format="%d/%m/%Y")
end_date = pd.to_datetime(end_date, format="%d/%m/%Y")

#select the required rows
output = df[df["HireStart"].le(end_date)&df["DCompleteDate"].fillna(start_date).ge(start_date)]

我认为最好的方法是使用 HireStart 作为索引并利用 pandas 切片作为日期时间索引。类似于:

df.set_index('HireStart')['2021-01-02':'2021-01-08']