python pandas 计算数据框中日期范围的小时数
python pandas calculating hours for daterange in dataframe
我想计算日期范围内的值班时间。标准值班时间为周一至周五每天 16 小时,周六和周日 24 小时。
我已经编写了适用于两个特定日期的代码:
date1 = date(2017,4, 13)
date2 = date(2017,4, 17)
def daterange(d1, d2):
return (d1 + datetime.timedelta(days=i) for i in range((d2 - d1).days + 1))
total = 0
for n in daterange(date1, date2):
if n.weekday() < 5:
total += 16
else:
total += 24
print (total)
我在将其实施到日期范围时遇到问题:
Start End
2017-02-03 2017-03-15
2017-02-05 2017-03-16
2017-02-06 2017-03-17
2017-02-10 2017-03-18
... ...
上面这些列的类型是 datetime64[ns]
错误是 TypeError: cannot convert the series to class 'int'
有什么方法可以为时间序列列计算这个值吗?它可以在新列中或仅在结果中
提前致谢!
您需要使用应用函数来执行此操作。该错误只是告诉您您没有正确调用该函数。
在 pandas 中,apply 方法将函数应用于数据帧的每一行(逐行)
将您的 pandas 数据框函数调用更改为:
df['new_column'] = df.apply( lambda x : daterange(x['start'],x['end']))
如果您需要进一步的帮助,请告诉我。
IIUC 你可以使用下面的简单映射:
示例系列:
In [110]: s = pd.date_range('2017-01-01', periods=10).to_series()
In [111]: s
Out[111]:
2017-01-01 2017-01-01
2017-01-02 2017-01-02
2017-01-03 2017-01-03
2017-01-04 2017-01-04
2017-01-05 2017-01-05
2017-01-06 2017-01-06
2017-01-07 2017-01-07
2017-01-08 2017-01-08
2017-01-09 2017-01-09
2017-01-10 2017-01-10
Freq: D, dtype: datetime64[ns]
映射
# DateLikeSeries.dt.weekday returns the day of the week with Monday=0, Sunday=6
In [94]: mapping = {i:16 if i<5 else 24 for i in range(7)}
In [95]: mapping
Out[95]: {0: 16, 1: 16, 2: 16, 3: 16, 4: 16, 5: 24, 6: 24}
In [112]: s.dt.weekday.map(mapping)
Out[112]:
2017-01-01 24
2017-01-02 16
2017-01-03 16
2017-01-04 16
2017-01-05 16
2017-01-06 16
2017-01-07 24
2017-01-08 24
2017-01-09 16
2017-01-10 16
Freq: D, dtype: int64
In [113]: s.dt.weekday.map(mapping).sum()
Out[113]: 184
您可以将此逻辑应用于您的 DataFrame:
In [107]: df
Out[107]:
Start End
0 2017-02-03 2017-03-15
1 2017-02-05 2017-03-16
2 2017-02-06 2017-03-17
3 2017-02-10 2017-03-18
In [108]: %paste
df['oncall_hours'] = \
df.apply(lambda x: pd.date_range(x['Start'], x['End'])
.to_series()
.dt.weekday
.map(mapping)
.sum(),
axis=1)
## -- End pasted text --
In [109]: df
Out[109]:
Start End oncall_hours
0 2017-02-03 2017-03-15 752
1 2017-02-05 2017-03-16 728
2 2017-02-06 2017-03-17 720
3 2017-02-10 2017-03-18 680
您可以将自定义函数与 apply
一起使用:
df['new'] = df.apply(lambda x : np.where(pd.date_range(x['Start'], x['End']).weekday < 5, 16, 24).sum(), axis=1)
print (df)
Start End new
0 2017-02-03 2017-03-15 752
1 2017-02-05 2017-03-16 728
2 2017-02-06 2017-03-17 720
3 2017-02-10 2017-03-18 680
等同于:
- 从两个日期
date_range
创建范围
- 然后得到
weekday
- 然后根据条件
numpy.where
获取小时数,最后 sum
def f(x):
b = pd.date_range(x['Start'], x['End']).weekday
return np.where(b < 5, 16, 24).sum()
df['new'] = df.apply(f, axis=1)
print (df)
Start End new
0 2017-02-03 2017-03-15 752
1 2017-02-05 2017-03-16 728
2 2017-02-06 2017-03-17 720
3 2017-02-10 2017-03-18 680
另一种解决方案,但我认为它更复杂:
#reshape df
df1 = df.stack().reset_index()
df1.columns = ['i','c','date']
#groupby by index and resample to days, forward fill NaNs
df1 = df1.set_index('date').groupby('i').resample('D').ffill()
.reset_index(level=0, drop=True).reset_index()
#get hours
df1['tot'] = np.where(df1['date'].dt.weekday < 5, 16, 24)
#sum by index
s = df1.groupby('i')['tot'].sum()
#join to original
df = df.join(s)
print (df.head(10))
Start End tot
0 2017-02-03 2017-03-15 752
1 2017-02-05 2017-03-16 728
2 2017-02-06 2017-03-17 720
3 2017-02-10 2017-03-18 680
时间:
df = pd.concat([df]*100).reset_index(drop=True)
print (df)
def f(df):
df1 = df.stack().reset_index()
df1.columns = ['i','c','date']
df1 = df1.set_index('date').groupby('i').resample('D').ffill().reset_index(level=0, drop=True).reset_index()
df1['tot'] = np.where(df1['date'].dt.weekday < 5, 16, 24)
s = df1.groupby('i')['tot'].sum()
return df.join(s)
print (f(df))
mapping = {i:16 if i<5 else 24 for i in range(7)}
In [190]: %timeit (f(df))
1 loop, best of 3: 482 ms per loop
#MaxU solution
In [191]: %timeit df['oncall_hours'] = df.apply(lambda x: pd.date_range(x['Start'], x['End']).to_series().dt.weekday.map(mapping).sum(), axis=1)
1 loop, best of 3: 531 ms per loop
In [192]: %timeit df['new'] = df.apply(lambda x : np.where(pd.date_range(x['Start'], x['End']).weekday < 5, 16, 24).sum(), axis=1)
10 loops, best of 3: 166 ms per loop
我想计算日期范围内的值班时间。标准值班时间为周一至周五每天 16 小时,周六和周日 24 小时。
我已经编写了适用于两个特定日期的代码:
date1 = date(2017,4, 13)
date2 = date(2017,4, 17)
def daterange(d1, d2):
return (d1 + datetime.timedelta(days=i) for i in range((d2 - d1).days + 1))
total = 0
for n in daterange(date1, date2):
if n.weekday() < 5:
total += 16
else:
total += 24
print (total)
我在将其实施到日期范围时遇到问题:
Start End
2017-02-03 2017-03-15
2017-02-05 2017-03-16
2017-02-06 2017-03-17
2017-02-10 2017-03-18
... ...
上面这些列的类型是 datetime64[ns]
错误是 TypeError: cannot convert the series to class 'int'
有什么方法可以为时间序列列计算这个值吗?它可以在新列中或仅在结果中
提前致谢!
您需要使用应用函数来执行此操作。该错误只是告诉您您没有正确调用该函数。
在 pandas 中,apply 方法将函数应用于数据帧的每一行(逐行)
将您的 pandas 数据框函数调用更改为:
df['new_column'] = df.apply( lambda x : daterange(x['start'],x['end']))
如果您需要进一步的帮助,请告诉我。
IIUC 你可以使用下面的简单映射:
示例系列:
In [110]: s = pd.date_range('2017-01-01', periods=10).to_series()
In [111]: s
Out[111]:
2017-01-01 2017-01-01
2017-01-02 2017-01-02
2017-01-03 2017-01-03
2017-01-04 2017-01-04
2017-01-05 2017-01-05
2017-01-06 2017-01-06
2017-01-07 2017-01-07
2017-01-08 2017-01-08
2017-01-09 2017-01-09
2017-01-10 2017-01-10
Freq: D, dtype: datetime64[ns]
映射
# DateLikeSeries.dt.weekday returns the day of the week with Monday=0, Sunday=6
In [94]: mapping = {i:16 if i<5 else 24 for i in range(7)}
In [95]: mapping
Out[95]: {0: 16, 1: 16, 2: 16, 3: 16, 4: 16, 5: 24, 6: 24}
In [112]: s.dt.weekday.map(mapping)
Out[112]:
2017-01-01 24
2017-01-02 16
2017-01-03 16
2017-01-04 16
2017-01-05 16
2017-01-06 16
2017-01-07 24
2017-01-08 24
2017-01-09 16
2017-01-10 16
Freq: D, dtype: int64
In [113]: s.dt.weekday.map(mapping).sum()
Out[113]: 184
您可以将此逻辑应用于您的 DataFrame:
In [107]: df
Out[107]:
Start End
0 2017-02-03 2017-03-15
1 2017-02-05 2017-03-16
2 2017-02-06 2017-03-17
3 2017-02-10 2017-03-18
In [108]: %paste
df['oncall_hours'] = \
df.apply(lambda x: pd.date_range(x['Start'], x['End'])
.to_series()
.dt.weekday
.map(mapping)
.sum(),
axis=1)
## -- End pasted text --
In [109]: df
Out[109]:
Start End oncall_hours
0 2017-02-03 2017-03-15 752
1 2017-02-05 2017-03-16 728
2 2017-02-06 2017-03-17 720
3 2017-02-10 2017-03-18 680
您可以将自定义函数与 apply
一起使用:
df['new'] = df.apply(lambda x : np.where(pd.date_range(x['Start'], x['End']).weekday < 5, 16, 24).sum(), axis=1)
print (df)
Start End new
0 2017-02-03 2017-03-15 752
1 2017-02-05 2017-03-16 728
2 2017-02-06 2017-03-17 720
3 2017-02-10 2017-03-18 680
等同于:
- 从两个日期
date_range
创建范围 - 然后得到
weekday
- 然后根据条件
numpy.where
获取小时数,最后sum
def f(x):
b = pd.date_range(x['Start'], x['End']).weekday
return np.where(b < 5, 16, 24).sum()
df['new'] = df.apply(f, axis=1)
print (df)
Start End new
0 2017-02-03 2017-03-15 752
1 2017-02-05 2017-03-16 728
2 2017-02-06 2017-03-17 720
3 2017-02-10 2017-03-18 680
另一种解决方案,但我认为它更复杂:
#reshape df
df1 = df.stack().reset_index()
df1.columns = ['i','c','date']
#groupby by index and resample to days, forward fill NaNs
df1 = df1.set_index('date').groupby('i').resample('D').ffill()
.reset_index(level=0, drop=True).reset_index()
#get hours
df1['tot'] = np.where(df1['date'].dt.weekday < 5, 16, 24)
#sum by index
s = df1.groupby('i')['tot'].sum()
#join to original
df = df.join(s)
print (df.head(10))
Start End tot
0 2017-02-03 2017-03-15 752
1 2017-02-05 2017-03-16 728
2 2017-02-06 2017-03-17 720
3 2017-02-10 2017-03-18 680
时间:
df = pd.concat([df]*100).reset_index(drop=True)
print (df)
def f(df):
df1 = df.stack().reset_index()
df1.columns = ['i','c','date']
df1 = df1.set_index('date').groupby('i').resample('D').ffill().reset_index(level=0, drop=True).reset_index()
df1['tot'] = np.where(df1['date'].dt.weekday < 5, 16, 24)
s = df1.groupby('i')['tot'].sum()
return df.join(s)
print (f(df))
mapping = {i:16 if i<5 else 24 for i in range(7)}
In [190]: %timeit (f(df))
1 loop, best of 3: 482 ms per loop
#MaxU solution
In [191]: %timeit df['oncall_hours'] = df.apply(lambda x: pd.date_range(x['Start'], x['End']).to_series().dt.weekday.map(mapping).sum(), axis=1)
1 loop, best of 3: 531 ms per loop
In [192]: %timeit df['new'] = df.apply(lambda x : np.where(pd.date_range(x['Start'], x['End']).weekday < 5, 16, 24).sum(), axis=1)
10 loops, best of 3: 166 ms per loop