我正在尝试比较时间戳是否属于 3 个容器中的 1 个。当我比较时,我得到 "Invalid comparison between dtype=datetime64[ns] and time"
I am trying to compare if a timestamp falls into 1 out of 3 bins. When I compare I get "Invalid comparison between dtype=datetime64[ns] and time"
此代码的objective是查看dtype为datetime64的'time'列,并确定它是否位于某个时间段内。要创建 bins 和子序列比较,我使用以下内容:
df_sum['start_time'] = pd.to_datetime(df_sum['start_time'])
df_sum['time'] = df_sum['start_time'].dt.strftime('%H:%M')
df_sum['time']=pd.to_datetime(df_sum['time'])
am_peak_start = pd.Timestamp('2021-01-01 06:00:00').time()
am_peak_end = pd.Timestamp('2021-01-01 09:00:00').time()
md_peak_end = pd.Timestamp('2021-01-01 16:00:00').time()
pm_peak_end = pd.Timestamp('2021-01-01 19:00:00').time()
am_condition = ((df_sum['time'] >= am_peak_start) & (df_sum['time'] < am_peak_end))
md_condition = ((df_sum['time'] >= am_peak_end) & (df_sum['time'] < md_peak_end))
pm_condition = ((df_sum['time'] >= md_peak_end) & (df_sum['time'] < pm_peak_end))
conditions = [am_condition, md_condition, pm_condition]
choices = ['am', 'md', 'pm']
df_sum['peak_period'] = np.select(conditions, choices, default = 'off-peak')
但这会引发错误,因为无法将 datetime64 与时间进行比较。不确定我需要为此做什么。
每次您尝试比较两个不同的 date/time 相关 类 时都会弹出此错误。
您的 df 中可以有一个小时专栏:
df_sum['hour'] = pd.to_datetime(df_sum['start_time']).dt.hour
并且在几个小时内达到峰值而不是 hours:minutes,如果您不需要它们:
am_peak_start = pd.Timestamp('2021-01-01 06:00:00').time().hour
然后按预期继续
试试这个:
根据您的示例创建了示例数据框
import pandas as pd
df = pd.DataFrame()
df['time'] = np.array(['2021-01-01 06:00:00','2021-01-01 07:00:00','2021-01-01 08:00:00','2021-01-01 09:00:00','2021-01-01 09:55:00',
'2021-01-01 10:00:00','2021-01-01 17:00:00','2021-01-01 19:00:00','2021-01-01 20:00:00','2021-01-01 21:00:00'])
df['time'] = pd.to_datetime(df['time'])
这是条件的新列
df['new'] = np.where((df['time'].dt.hour >= 6) & (df['time'].dt.hour <= 9), 'am',
np.where((df['time'].dt.hour >= 9) & (df['time'].dt.hour <= 16), 'md',
np.where((df['time'].dt.hour >= 16) & (df['time'].dt.hour <= 19), 'pm','NA') ) )
输出:
time new
0 2021-01-01 06:00:00 am
1 2021-01-01 07:00:00 am
2 2021-01-01 08:00:00 am
3 2021-01-01 09:00:00 am
4 2021-01-01 09:55:00 am
5 2021-01-01 10:00:00 md
6 2021-01-01 17:00:00 pm
7 2021-01-01 19:00:00 pm
欢迎根据需要修改代码中的条件。
此代码的objective是查看dtype为datetime64的'time'列,并确定它是否位于某个时间段内。要创建 bins 和子序列比较,我使用以下内容:
df_sum['start_time'] = pd.to_datetime(df_sum['start_time'])
df_sum['time'] = df_sum['start_time'].dt.strftime('%H:%M')
df_sum['time']=pd.to_datetime(df_sum['time'])
am_peak_start = pd.Timestamp('2021-01-01 06:00:00').time()
am_peak_end = pd.Timestamp('2021-01-01 09:00:00').time()
md_peak_end = pd.Timestamp('2021-01-01 16:00:00').time()
pm_peak_end = pd.Timestamp('2021-01-01 19:00:00').time()
am_condition = ((df_sum['time'] >= am_peak_start) & (df_sum['time'] < am_peak_end))
md_condition = ((df_sum['time'] >= am_peak_end) & (df_sum['time'] < md_peak_end))
pm_condition = ((df_sum['time'] >= md_peak_end) & (df_sum['time'] < pm_peak_end))
conditions = [am_condition, md_condition, pm_condition]
choices = ['am', 'md', 'pm']
df_sum['peak_period'] = np.select(conditions, choices, default = 'off-peak')
但这会引发错误,因为无法将 datetime64 与时间进行比较。不确定我需要为此做什么。
每次您尝试比较两个不同的 date/time 相关 类 时都会弹出此错误。
您的 df 中可以有一个小时专栏:
df_sum['hour'] = pd.to_datetime(df_sum['start_time']).dt.hour
并且在几个小时内达到峰值而不是 hours:minutes,如果您不需要它们:
am_peak_start = pd.Timestamp('2021-01-01 06:00:00').time().hour
然后按预期继续
试试这个:
根据您的示例创建了示例数据框
import pandas as pd
df = pd.DataFrame()
df['time'] = np.array(['2021-01-01 06:00:00','2021-01-01 07:00:00','2021-01-01 08:00:00','2021-01-01 09:00:00','2021-01-01 09:55:00',
'2021-01-01 10:00:00','2021-01-01 17:00:00','2021-01-01 19:00:00','2021-01-01 20:00:00','2021-01-01 21:00:00'])
df['time'] = pd.to_datetime(df['time'])
这是条件的新列
df['new'] = np.where((df['time'].dt.hour >= 6) & (df['time'].dt.hour <= 9), 'am',
np.where((df['time'].dt.hour >= 9) & (df['time'].dt.hour <= 16), 'md',
np.where((df['time'].dt.hour >= 16) & (df['time'].dt.hour <= 19), 'pm','NA') ) )
输出:
time new
0 2021-01-01 06:00:00 am
1 2021-01-01 07:00:00 am
2 2021-01-01 08:00:00 am
3 2021-01-01 09:00:00 am
4 2021-01-01 09:55:00 am
5 2021-01-01 10:00:00 md
6 2021-01-01 17:00:00 pm
7 2021-01-01 19:00:00 pm
欢迎根据需要修改代码中的条件。