创建新行并根据时间间隔重复值(如果它们属于
Create new rows and repeat the values based on time interval if they belong to
我有一个包含很多列的 Pandas 数据框。其中两个是时间戳(start
和 end
)。
start end value string
2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 17:00:00 2021-12-01 17:30:00 2 b
2021-12-01 14:00:00 2021-12-01 16:00:00 3 c
我需要将时间戳标记(下面的 time
列)标准化为 5 到 5 分钟,重复属于同一时间的其他列 value
和 string
中的值间隔,像这样:
time start end value string
2021-12-01 14:00:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:05:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:10:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:15:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:20:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:25:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:30:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 17:00:00 2021-12-01 17:00:00 2021-12-01 17:30:00 2 b
2021-12-01 17:05:00 2021-12-01 17:00:00 2021-12-01 17:30:00 2 b
....
时间间隔有很多交点,所以我不能用df.resample
和DatetimeIndex
。
您可以为每一行创建一个 pd.date_range
的日期范围,然后 explode
它们:
new_df = df.assign(time=df.apply(lambda x: pd.date_range(x['start'], x['end'], freq='5min'), axis=1)).explode('time').reset_index(drop=True)
输出:
>>> new
start end value string time
0 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:00:00
1 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:05:00
2 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:10:00
3 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:15:00
4 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:20:00
5 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:25:00
6 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:30:00
7 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:35:00
8 2021-12-01 17:00:00 2021-12-01 17:30:00 2 b 2021-12-01 17:00:00
9 2021-12-01 17:00:00 2021-12-01 17:30:00 2 b 2021-12-01 17:05:00
...
我有一个包含很多列的 Pandas 数据框。其中两个是时间戳(start
和 end
)。
start end value string
2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 17:00:00 2021-12-01 17:30:00 2 b
2021-12-01 14:00:00 2021-12-01 16:00:00 3 c
我需要将时间戳标记(下面的 time
列)标准化为 5 到 5 分钟,重复属于同一时间的其他列 value
和 string
中的值间隔,像这样:
time start end value string
2021-12-01 14:00:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:05:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:10:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:15:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:20:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:25:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 14:30:00 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a
2021-12-01 17:00:00 2021-12-01 17:00:00 2021-12-01 17:30:00 2 b
2021-12-01 17:05:00 2021-12-01 17:00:00 2021-12-01 17:30:00 2 b
....
时间间隔有很多交点,所以我不能用df.resample
和DatetimeIndex
。
您可以为每一行创建一个 pd.date_range
的日期范围,然后 explode
它们:
new_df = df.assign(time=df.apply(lambda x: pd.date_range(x['start'], x['end'], freq='5min'), axis=1)).explode('time').reset_index(drop=True)
输出:
>>> new
start end value string time
0 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:00:00
1 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:05:00
2 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:10:00
3 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:15:00
4 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:20:00
5 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:25:00
6 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:30:00
7 2021-12-01 14:00:00 2021-12-01 14:35:00 1 a 2021-12-01 14:35:00
8 2021-12-01 17:00:00 2021-12-01 17:30:00 2 b 2021-12-01 17:00:00
9 2021-12-01 17:00:00 2021-12-01 17:30:00 2 b 2021-12-01 17:05:00
...