搜索缺少的时间戳并在 python 中显示?
Search Missing Timestamp and display in python?
这是我的一些数据集,其中包含时间、温度 1、温度 2
Timestamp. Temperature1. Temperature2
09/01/2016 00:00:08 53.4. 45.5
09/01/2016 00:00:38. 53.5. 45.2
09/01/2016 00:01:08. 54.6. 43.2
09/01/2016 00:01:38. 55.2. 46.3
09/01/2016 00:02:08. 54.5. 45.5
09/01/2016 00:04:08. 54.2. 35.5
09/01/2016 00:05:08. 52.4. 45.7
09/01/2016 00:05:38. 53.4. 45.2
我的数据每 30 秒接收一次..
这是我的数据集..缺少一些时间戳..bcoz。每隔 30 秒我的数据就来一次……所以有些数据点丢失了……
如何找到该数据点……并将数据作为 NAN 插入到那里……
请帮助我..
假设时间戳已转换为datetime
,如果将索引设置为时间戳列,然后reindex
加上日期范围,则缺失值将显示:
In [94]:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Timestamp')
df
Out[94]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
In [96]:
df.reindex(pd.date_range(start=df.index[0], end=df.index[-1], freq='30s'))
Out[96]:
Temperature1 Temperature2
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
假设时间戳是规则的,这里我们使用时间戳第一个和最后一个值构造一个日期范围,频率为 30 秒:
In [97]:
pd.date_range(start=df.index[0], end=df.index[-1], freq='30s')
Out[97]:
DatetimeIndex(['2016-09-01 00:00:08', '2016-09-01 00:00:38',
'2016-09-01 00:01:08', '2016-09-01 00:01:38',
'2016-09-01 00:02:08', '2016-09-01 00:02:38',
'2016-09-01 00:03:08', '2016-09-01 00:03:38',
'2016-09-01 00:04:08', '2016-09-01 00:04:38',
'2016-09-01 00:05:08', '2016-09-01 00:05:38'],
dtype='datetime64[ns]', freq='30S')
当您 reindex
使用此功能时,任何缺失的索引标签都会变成 NaN
值
你可以使用resample('30S', base=8)方法:
In [20]: x.resample('30S', base=8).mean()
Out[20]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
上面的解决方案假定 Timestamp
是 datetime
dtype 并且它已被设置为索引。
如果 Timestamp
是常规列(不是索引),那么从 Pandas 0.19.0 开始我们可以在常规列(它必须是 datetime
dtype)上重新采样,使用 on='column_name'
参数:
In [26]: x.resample('30S', on='Timestamp', base=8).mean()
Out[26]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
如果您需要动态地找到您的base
值,您可以这样做:
In [21]: x.index[0].second
Out[21]: 8
来自 docs:
base : int, default 0
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for 5min
frequency, base could range from 0
through 4
.
Defaults to 0
这是我的一些数据集,其中包含时间、温度 1、温度 2
Timestamp. Temperature1. Temperature2
09/01/2016 00:00:08 53.4. 45.5
09/01/2016 00:00:38. 53.5. 45.2
09/01/2016 00:01:08. 54.6. 43.2
09/01/2016 00:01:38. 55.2. 46.3
09/01/2016 00:02:08. 54.5. 45.5
09/01/2016 00:04:08. 54.2. 35.5
09/01/2016 00:05:08. 52.4. 45.7
09/01/2016 00:05:38. 53.4. 45.2
我的数据每 30 秒接收一次..
这是我的数据集..缺少一些时间戳..bcoz。每隔 30 秒我的数据就来一次……所以有些数据点丢失了…… 如何找到该数据点……并将数据作为 NAN 插入到那里…… 请帮助我..
假设时间戳已转换为datetime
,如果将索引设置为时间戳列,然后reindex
加上日期范围,则缺失值将显示:
In [94]:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Timestamp')
df
Out[94]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
In [96]:
df.reindex(pd.date_range(start=df.index[0], end=df.index[-1], freq='30s'))
Out[96]:
Temperature1 Temperature2
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
假设时间戳是规则的,这里我们使用时间戳第一个和最后一个值构造一个日期范围,频率为 30 秒:
In [97]:
pd.date_range(start=df.index[0], end=df.index[-1], freq='30s')
Out[97]:
DatetimeIndex(['2016-09-01 00:00:08', '2016-09-01 00:00:38',
'2016-09-01 00:01:08', '2016-09-01 00:01:38',
'2016-09-01 00:02:08', '2016-09-01 00:02:38',
'2016-09-01 00:03:08', '2016-09-01 00:03:38',
'2016-09-01 00:04:08', '2016-09-01 00:04:38',
'2016-09-01 00:05:08', '2016-09-01 00:05:38'],
dtype='datetime64[ns]', freq='30S')
当您 reindex
使用此功能时,任何缺失的索引标签都会变成 NaN
值
你可以使用resample('30S', base=8)方法:
In [20]: x.resample('30S', base=8).mean()
Out[20]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
上面的解决方案假定 Timestamp
是 datetime
dtype 并且它已被设置为索引。
如果 Timestamp
是常规列(不是索引),那么从 Pandas 0.19.0 开始我们可以在常规列(它必须是 datetime
dtype)上重新采样,使用 on='column_name'
参数:
In [26]: x.resample('30S', on='Timestamp', base=8).mean()
Out[26]:
Temperature1 Temperature2
Timestamp
2016-09-01 00:00:08 53.4 45.5
2016-09-01 00:00:38 53.5 45.2
2016-09-01 00:01:08 54.6 43.2
2016-09-01 00:01:38 55.2 46.3
2016-09-01 00:02:08 54.5 45.5
2016-09-01 00:02:38 NaN NaN
2016-09-01 00:03:08 NaN NaN
2016-09-01 00:03:38 NaN NaN
2016-09-01 00:04:08 54.2 35.5
2016-09-01 00:04:38 NaN NaN
2016-09-01 00:05:08 52.4 45.7
2016-09-01 00:05:38 53.4 45.2
如果您需要动态地找到您的base
值,您可以这样做:
In [21]: x.index[0].second
Out[21]: 8
来自 docs:
base : int, default 0
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for
5min
frequency, base could range from0
through4
.Defaults to
0