根据预期间隔获取数据集中缺失时间戳的计数
Getting a count of missing timestamps in a dataset based on expected interval
我有一个数据集,它由压力的时间戳读数组成,应该每 15 分钟记录一次并返回一次。下面的示例数据显示数据集中存在大于 15 分钟的间隔
我一直在尝试找到一种方法来添加一个计数列来计算接收到的读数之间丢失了多少读数,例如 15 分钟的间隔 = 0(丢失的读数),半小时的间隔 = 1(漏读)和 45 分钟的差距 = 2,等等
在这个阶段我没有任何想要展示的代码,因为我还有很长的路要走,虽然我一直在努力解决这个 post,但仍然没有成功。 How Can I Detect Gaps and Consecutive Periods In A Time Series In Pandas
任何指点将不胜感激
Place date pressure (m)
Somewhere 01/09/2019 00:00 34
Somewhere 01/09/2019 00:30 34
Somewhere 01/09/2019 00:45 34
Somewhere 01/09/2019 01:15 34
Somewhere 01/09/2019 01:30 34
Somewhere 01/09/2019 02:15 34
Somewhere 01/09/2019 02:30 34
Somewhere 01/09/2019 02:45 34
Somewhere 01/09/2019 03:15 34
Somewhere 01/09/2019 03:30 34
Somewhere 01/09/2019 03:45 34.5
Somewhere 01/09/2019 04:00 34
Somewhere 01/09/2019 04:15 34
Somewhere 01/09/2019 06:45 33.5
Somewhere 01/09/2019 07:00 33.5
Somewhere 01/09/2019 07:30 34
使用pd.to_datetime
to convert the date
column into pandas datetime series, then use Series.diff
to calculate the successive differences between the dates, then divide this differences by pd.Timedelta
,间隔15min
,最后使用.fillna
将NaN
值填充为0
:
df['date'] = pd.to_datetime(df['date'])
df['gap'] = (df['date'].diff() / pd.Timedelta(minutes=15)).sub(1).fillna(0)
结果:
# print(df)
Place date pressure (m) gap
0 Somewhere 2019-01-09 00:00:00 34.0 0.0
1 Somewhere 2019-01-09 00:30:00 34.0 1.0
2 Somewhere 2019-01-09 00:45:00 34.0 0.0
3 Somewhere 2019-01-09 01:15:00 34.0 1.0
4 Somewhere 2019-01-09 01:30:00 34.0 0.0
5 Somewhere 2019-01-09 02:15:00 34.0 2.0
6 Somewhere 2019-01-09 02:30:00 34.0 0.0
7 Somewhere 2019-01-09 02:45:00 34.0 0.0
8 Somewhere 2019-01-09 03:15:00 34.0 1.0
9 Somewhere 2019-01-09 03:30:00 34.0 0.0
10 Somewhere 2019-01-09 03:45:00 34.5 0.0
11 Somewhere 2019-01-09 04:00:00 34.0 0.0
12 Somewhere 2019-01-09 04:15:00 34.0 0.0
13 Somewhere 2019-01-09 06:45:00 33.5 9.0
14 Somewhere 2019-01-09 07:00:00 33.5 0.0
15 Somewhere 2019-01-09 07:30:00 34.0 1.0
我有一个数据集,它由压力的时间戳读数组成,应该每 15 分钟记录一次并返回一次。下面的示例数据显示数据集中存在大于 15 分钟的间隔
我一直在尝试找到一种方法来添加一个计数列来计算接收到的读数之间丢失了多少读数,例如 15 分钟的间隔 = 0(丢失的读数),半小时的间隔 = 1(漏读)和 45 分钟的差距 = 2,等等
在这个阶段我没有任何想要展示的代码,因为我还有很长的路要走,虽然我一直在努力解决这个 post,但仍然没有成功。 How Can I Detect Gaps and Consecutive Periods In A Time Series In Pandas
任何指点将不胜感激
Place date pressure (m)
Somewhere 01/09/2019 00:00 34
Somewhere 01/09/2019 00:30 34
Somewhere 01/09/2019 00:45 34
Somewhere 01/09/2019 01:15 34
Somewhere 01/09/2019 01:30 34
Somewhere 01/09/2019 02:15 34
Somewhere 01/09/2019 02:30 34
Somewhere 01/09/2019 02:45 34
Somewhere 01/09/2019 03:15 34
Somewhere 01/09/2019 03:30 34
Somewhere 01/09/2019 03:45 34.5
Somewhere 01/09/2019 04:00 34
Somewhere 01/09/2019 04:15 34
Somewhere 01/09/2019 06:45 33.5
Somewhere 01/09/2019 07:00 33.5
Somewhere 01/09/2019 07:30 34
使用pd.to_datetime
to convert the date
column into pandas datetime series, then use Series.diff
to calculate the successive differences between the dates, then divide this differences by pd.Timedelta
,间隔15min
,最后使用.fillna
将NaN
值填充为0
:
df['date'] = pd.to_datetime(df['date'])
df['gap'] = (df['date'].diff() / pd.Timedelta(minutes=15)).sub(1).fillna(0)
结果:
# print(df)
Place date pressure (m) gap
0 Somewhere 2019-01-09 00:00:00 34.0 0.0
1 Somewhere 2019-01-09 00:30:00 34.0 1.0
2 Somewhere 2019-01-09 00:45:00 34.0 0.0
3 Somewhere 2019-01-09 01:15:00 34.0 1.0
4 Somewhere 2019-01-09 01:30:00 34.0 0.0
5 Somewhere 2019-01-09 02:15:00 34.0 2.0
6 Somewhere 2019-01-09 02:30:00 34.0 0.0
7 Somewhere 2019-01-09 02:45:00 34.0 0.0
8 Somewhere 2019-01-09 03:15:00 34.0 1.0
9 Somewhere 2019-01-09 03:30:00 34.0 0.0
10 Somewhere 2019-01-09 03:45:00 34.5 0.0
11 Somewhere 2019-01-09 04:00:00 34.0 0.0
12 Somewhere 2019-01-09 04:15:00 34.0 0.0
13 Somewhere 2019-01-09 06:45:00 33.5 9.0
14 Somewhere 2019-01-09 07:00:00 33.5 0.0
15 Somewhere 2019-01-09 07:30:00 34.0 1.0