Pandas 按日期范围和条件过滤 df
Pandas filter df by date range and condition
我有一个包含 3 个日期时间列的数据框
ItemUid HireStart DCompleteDate OffHire
14055 2021-01-01 2021-12-17 2021-01-09
14065 2021-08-12 2021-12-17 2021-11-17
14534 2018-12-21 NaT NaT
11639 NaT NaT NaT
43268 2020-09-07 2020-09-03 2020-11-03
36723 2021-01-03 Nat 2021-01-10
我正在尝试 return 一个数据框,其中 return 是在用户输入的日期范围内租用的项目。
即:如果用户输入:开始日期 = '2021-01-02' & 结束日期 = '2021-01-08' 预期结果将是:
ItemUid HireStart DCompleteDate OffHire
14055 2021-01-01 2021-01-23 2021-01-09
14534 2018-12-21 NaT NaT
36723 2021-01-03 Nat 2021-01-10
我的代码:)
def date_range(df):
start_date = input("Enter start date dd/mm/yyyy: ")
end_date = input("Enter end date dd/mm/yyyy: ")
df = df[(df['OffHire'] <= end_date) &
((df['HireStart'].notna()) | (df['HireStart'] >= start_date))]
return df
result = df_hire.apply(date_range, axis=1)
当前出现错误:
TypeError Traceback (most recent call last)
<ipython-input-60-6d4d17020cba> in <module>()
9 return df
10
---> 11 result = df_hire.apply(date_range, axis=1)
4 frames
<ipython-input-60-6d4d17020cba> in date_range(df)
3 end_date = input("Enter end date dd/mm/yyyy: ")
4
----> 5 df = df[(df['OffHire'] <= end_date) &
6 ((df['HireStart'].notna()) | (df['HireStart'] >= start_date))]
7
TypeError: '<=' not supported between instances of 'Timestamp' and 'str'
我可能可以修复错误,但是如何应用该函数的实现让我卡住了!
任何帮助将不胜感激,这将是我的又一课!
提前致谢
IIUC,你想要这样的东西:
#convert the date columns to datetime
df["HireStart"] = pd.to_datetime(df["HireStart"])
df["DCompleteDate"] = pd.to_datetime(df["DCompleteDate"])
df["OffHire"] = pd.to_datetime(df["OffHire"])
#convert inputs to datetime
start_date = pd.to_datetime(start_date, format="%d/%m/%Y")
end_date = pd.to_datetime(end_date, format="%d/%m/%Y")
#select the required rows
output = df[df["HireStart"].le(end_date)&df["DCompleteDate"].fillna(start_date).ge(start_date)]
我认为最好的方法是使用 HireStart
作为索引并利用 pandas 切片作为日期时间索引。类似于:
df.set_index('HireStart')['2021-01-02':'2021-01-08']
我有一个包含 3 个日期时间列的数据框
ItemUid HireStart DCompleteDate OffHire
14055 2021-01-01 2021-12-17 2021-01-09
14065 2021-08-12 2021-12-17 2021-11-17
14534 2018-12-21 NaT NaT
11639 NaT NaT NaT
43268 2020-09-07 2020-09-03 2020-11-03
36723 2021-01-03 Nat 2021-01-10
我正在尝试 return 一个数据框,其中 return 是在用户输入的日期范围内租用的项目。
即:如果用户输入:开始日期 = '2021-01-02' & 结束日期 = '2021-01-08' 预期结果将是:
ItemUid HireStart DCompleteDate OffHire
14055 2021-01-01 2021-01-23 2021-01-09
14534 2018-12-21 NaT NaT
36723 2021-01-03 Nat 2021-01-10
我的代码:)
def date_range(df):
start_date = input("Enter start date dd/mm/yyyy: ")
end_date = input("Enter end date dd/mm/yyyy: ")
df = df[(df['OffHire'] <= end_date) &
((df['HireStart'].notna()) | (df['HireStart'] >= start_date))]
return df
result = df_hire.apply(date_range, axis=1)
当前出现错误:
TypeError Traceback (most recent call last)
<ipython-input-60-6d4d17020cba> in <module>()
9 return df
10
---> 11 result = df_hire.apply(date_range, axis=1)
4 frames
<ipython-input-60-6d4d17020cba> in date_range(df)
3 end_date = input("Enter end date dd/mm/yyyy: ")
4
----> 5 df = df[(df['OffHire'] <= end_date) &
6 ((df['HireStart'].notna()) | (df['HireStart'] >= start_date))]
7
TypeError: '<=' not supported between instances of 'Timestamp' and 'str'
我可能可以修复错误,但是如何应用该函数的实现让我卡住了!
任何帮助将不胜感激,这将是我的又一课!
提前致谢
IIUC,你想要这样的东西:
#convert the date columns to datetime
df["HireStart"] = pd.to_datetime(df["HireStart"])
df["DCompleteDate"] = pd.to_datetime(df["DCompleteDate"])
df["OffHire"] = pd.to_datetime(df["OffHire"])
#convert inputs to datetime
start_date = pd.to_datetime(start_date, format="%d/%m/%Y")
end_date = pd.to_datetime(end_date, format="%d/%m/%Y")
#select the required rows
output = df[df["HireStart"].le(end_date)&df["DCompleteDate"].fillna(start_date).ge(start_date)]
我认为最好的方法是使用 HireStart
作为索引并利用 pandas 切片作为日期时间索引。类似于:
df.set_index('HireStart')['2021-01-02':'2021-01-08']