如何在两个日期之间添加 date_range - Python Pandas
How add date_range between two dates - Python Pandas
我想处理一些日子之间的时间重叠。正如您在我的 df 中所见,我的开始日期为 2019-10-25,结束日期为 2019-10-27:
begin end info
2019-10-25 10:39:58.352073 2019-10-25 10:40:06.266782 toto
2019-10-25 16:35:22.485574 2019-10-27 09:50:31.713179 tata <------ HERE
2019-10-27 09:50:31.713179 2019-10-27 09:50:31.713192 titi
2019-10-28 14:04:33.095633 2019-10-28 14:05:07.639344 tete
我想在这两个日期之间添加尽可能多的时间段(日期 00:00:00;日期 23:59:59.9)并复制数据 info,像这样:
2019-10-25 16:35:22.485574 2019-10-25 23:59:59.999999 tata
2019-10-26 00:00:00.000000 2019-10-26 23:59:59.999999 tata
2019-10-27 00:00:00.000000 2019-10-27 09:50:31.713179 tata
- 如果开始日期与结束日期不同,则 => 计算天数
- 保留开头并添加新的结尾'date 23:59:59.9'
- 添加新的 date_range 对应的天数
- 结束并添加新的开始'date 00:00:00.0'
- 填写'info'
最终预期结果:
begin end info
2019-10-25 10:39:58.352073 2019-10-25 10:40:06.266782 toto
2019-10-25 16:35:22.485574 2019-10-25 23:59:59.999999 tata
2019-10-26 00:00:00.000000 2019-10-26 23:59:59.999999 tata
2019-10-27 00:00:00.000000 2019-10-27 09:50:31.713179 tata
2019-10-27 09:50:31.713179 2019-10-27 09:50:31.713192 titi
2019-10-28 14:04:33.095633 2019-10-28 14:05:07.639344 tete
但不知道如何实现date_range,填写信息,添加具体的行数
谢谢你的时间
假设 begin
和 end
已经是 Timestamp
类型:
# Generate a series of Timedeltas for each row
n = (
(df['end'].dt.normalize() - df['begin'].dt.normalize())
.apply(lambda d: [pd.Timedelta(days=i) for i in range(d.days+1)])
.explode()
).rename('n')
df = df.join(n)
# Adjust the begin and end of each row
adjusted_begin = np.max([
df['begin'],
df['begin'].dt.normalize() + df['n']
], axis=0)
adjusted_end = np.min([
df['end'],
pd.Series(adjusted_begin).dt.normalize() + pd.Timedelta(days=1, milliseconds=-100)
], axis=0)
# Final assembly
df = df.assign(begin_=adjusted_begin, end_=adjusted_end)
结果:
begin end info n begin_ end_
0 2019-10-25 10:39:58.352073 2019-10-25 10:40:06.266782 toto 0 days 2019-10-25 10:39:58.352073 2019-10-25 10:40:06.266782
1 2019-10-25 16:35:22.485574 2019-10-27 09:50:31.713179 tata 0 days 2019-10-25 16:35:22.485574 2019-10-25 23:59:59.900000
1 2019-10-25 16:35:22.485574 2019-10-27 09:50:31.713179 tata 1 days 2019-10-26 00:00:00.000000 2019-10-26 23:59:59.900000
1 2019-10-25 16:35:22.485574 2019-10-27 09:50:31.713179 tata 2 days 2019-10-27 00:00:00.000000 2019-10-27 09:50:31.713179
2 2019-10-27 09:50:31.713179 2019-10-27 09:50:31.713192 titi 0 days 2019-10-27 09:50:31.713179 2019-10-27 09:50:31.713192
3 2019-10-28 14:04:33.095633 2019-10-28 14:05:07.639344 tete 0 days 2019-10-28 14:04:33.095633 2019-10-28 14:05:07.639344
Trim 关闭不需要的列
我想处理一些日子之间的时间重叠。正如您在我的 df 中所见,我的开始日期为 2019-10-25,结束日期为 2019-10-27:
begin end info
2019-10-25 10:39:58.352073 2019-10-25 10:40:06.266782 toto
2019-10-25 16:35:22.485574 2019-10-27 09:50:31.713179 tata <------ HERE
2019-10-27 09:50:31.713179 2019-10-27 09:50:31.713192 titi
2019-10-28 14:04:33.095633 2019-10-28 14:05:07.639344 tete
我想在这两个日期之间添加尽可能多的时间段(日期 00:00:00;日期 23:59:59.9)并复制数据 info,像这样:
2019-10-25 16:35:22.485574 2019-10-25 23:59:59.999999 tata
2019-10-26 00:00:00.000000 2019-10-26 23:59:59.999999 tata
2019-10-27 00:00:00.000000 2019-10-27 09:50:31.713179 tata
- 如果开始日期与结束日期不同,则 => 计算天数
- 保留开头并添加新的结尾'date 23:59:59.9'
- 添加新的 date_range 对应的天数
- 结束并添加新的开始'date 00:00:00.0'
- 填写'info'
最终预期结果:
begin end info
2019-10-25 10:39:58.352073 2019-10-25 10:40:06.266782 toto
2019-10-25 16:35:22.485574 2019-10-25 23:59:59.999999 tata
2019-10-26 00:00:00.000000 2019-10-26 23:59:59.999999 tata
2019-10-27 00:00:00.000000 2019-10-27 09:50:31.713179 tata
2019-10-27 09:50:31.713179 2019-10-27 09:50:31.713192 titi
2019-10-28 14:04:33.095633 2019-10-28 14:05:07.639344 tete
但不知道如何实现date_range,填写信息,添加具体的行数
谢谢你的时间
假设 begin
和 end
已经是 Timestamp
类型:
# Generate a series of Timedeltas for each row
n = (
(df['end'].dt.normalize() - df['begin'].dt.normalize())
.apply(lambda d: [pd.Timedelta(days=i) for i in range(d.days+1)])
.explode()
).rename('n')
df = df.join(n)
# Adjust the begin and end of each row
adjusted_begin = np.max([
df['begin'],
df['begin'].dt.normalize() + df['n']
], axis=0)
adjusted_end = np.min([
df['end'],
pd.Series(adjusted_begin).dt.normalize() + pd.Timedelta(days=1, milliseconds=-100)
], axis=0)
# Final assembly
df = df.assign(begin_=adjusted_begin, end_=adjusted_end)
结果:
begin end info n begin_ end_
0 2019-10-25 10:39:58.352073 2019-10-25 10:40:06.266782 toto 0 days 2019-10-25 10:39:58.352073 2019-10-25 10:40:06.266782
1 2019-10-25 16:35:22.485574 2019-10-27 09:50:31.713179 tata 0 days 2019-10-25 16:35:22.485574 2019-10-25 23:59:59.900000
1 2019-10-25 16:35:22.485574 2019-10-27 09:50:31.713179 tata 1 days 2019-10-26 00:00:00.000000 2019-10-26 23:59:59.900000
1 2019-10-25 16:35:22.485574 2019-10-27 09:50:31.713179 tata 2 days 2019-10-27 00:00:00.000000 2019-10-27 09:50:31.713179
2 2019-10-27 09:50:31.713179 2019-10-27 09:50:31.713192 titi 0 days 2019-10-27 09:50:31.713179 2019-10-27 09:50:31.713192
3 2019-10-28 14:04:33.095633 2019-10-28 14:05:07.639344 tete 0 days 2019-10-28 14:04:33.095633 2019-10-28 14:05:07.639344
Trim 关闭不需要的列