搜索缺少的时间戳并在 python 中显示?

Search Missing Timestamp and display in python?

这是我的一些数据集,其中包含时间、温度 1、温度 2

Timestamp.             Temperature1.        Temperature2
09/01/2016 00:00:08          53.4.                       45.5
09/01/2016 00:00:38.         53.5.                       45.2
09/01/2016 00:01:08.         54.6.                        43.2
09/01/2016 00:01:38.         55.2.                        46.3
09/01/2016 00:02:08.         54.5.                        45.5
09/01/2016 00:04:08.         54.2.                       35.5
09/01/2016 00:05:08.         52.4.                        45.7
09/01/2016 00:05:38.         53.4.                         45.2

我的数据每 30 秒接收一次..

这是我的数据集..缺少一些时间戳..bcoz。每隔 30 秒我的数据就来一次……所以有些数据点丢失了…… 如何找到该数据点……并将数据作为 NAN 插入到那里…… 请帮助我..

假设时间戳已转换为datetime,如果将索引设置为时间戳列,然后reindex 加上日期范围,则缺失值将显示:

In [94]:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Timestamp')
df

Out[94]:
                     Temperature1  Temperature2
Timestamp                                      
2016-09-01 00:00:08          53.4          45.5
2016-09-01 00:00:38          53.5          45.2
2016-09-01 00:01:08          54.6          43.2
2016-09-01 00:01:38          55.2          46.3
2016-09-01 00:02:08          54.5          45.5
2016-09-01 00:04:08          54.2          35.5
2016-09-01 00:05:08          52.4          45.7
2016-09-01 00:05:38          53.4          45.2

In [96]:    
df.reindex(pd.date_range(start=df.index[0], end=df.index[-1], freq='30s'))

Out[96]:
                     Temperature1  Temperature2
2016-09-01 00:00:08          53.4          45.5
2016-09-01 00:00:38          53.5          45.2
2016-09-01 00:01:08          54.6          43.2
2016-09-01 00:01:38          55.2          46.3
2016-09-01 00:02:08          54.5          45.5
2016-09-01 00:02:38           NaN           NaN
2016-09-01 00:03:08           NaN           NaN
2016-09-01 00:03:38           NaN           NaN
2016-09-01 00:04:08          54.2          35.5
2016-09-01 00:04:38           NaN           NaN
2016-09-01 00:05:08          52.4          45.7
2016-09-01 00:05:38          53.4          45.2

假设时间戳是规则的,这里我们使用时间戳第一个和最后一个值构造一个日期范围,频率为 30 秒:

In [97]:
pd.date_range(start=df.index[0], end=df.index[-1], freq='30s')

Out[97]:
DatetimeIndex(['2016-09-01 00:00:08', '2016-09-01 00:00:38',
               '2016-09-01 00:01:08', '2016-09-01 00:01:38',
               '2016-09-01 00:02:08', '2016-09-01 00:02:38',
               '2016-09-01 00:03:08', '2016-09-01 00:03:38',
               '2016-09-01 00:04:08', '2016-09-01 00:04:38',
               '2016-09-01 00:05:08', '2016-09-01 00:05:38'],
              dtype='datetime64[ns]', freq='30S')

当您 reindex 使用此功能时,任何缺失的索引标签都会变成 NaN

你可以使用resample('30S', base=8)方法:

In [20]: x.resample('30S', base=8).mean()
Out[20]:
                                 Temperature1    Temperature2
Timestamp
2016-09-01 00:00:08                      53.4            45.5
2016-09-01 00:00:38                      53.5            45.2
2016-09-01 00:01:08                      54.6            43.2
2016-09-01 00:01:38                      55.2            46.3
2016-09-01 00:02:08                      54.5            45.5
2016-09-01 00:02:38                       NaN             NaN
2016-09-01 00:03:08                       NaN             NaN
2016-09-01 00:03:38                       NaN             NaN
2016-09-01 00:04:08                      54.2            35.5
2016-09-01 00:04:38                       NaN             NaN
2016-09-01 00:05:08                      52.4            45.7
2016-09-01 00:05:38                      53.4            45.2

上面的解决方案假定 Timestampdatetime dtype 并且它已被设置为索引。 如果 Timestamp 是常规列(不是索引),那么从 Pandas 0.19.0 开始我们可以在常规列(它必须是 datetime dtype)上重新采样,使用 on='column_name'参数:

In [26]: x.resample('30S', on='Timestamp', base=8).mean()
Out[26]:
                                 Temperature1    Temperature2
Timestamp
2016-09-01 00:00:08                      53.4            45.5
2016-09-01 00:00:38                      53.5            45.2
2016-09-01 00:01:08                      54.6            43.2
2016-09-01 00:01:38                      55.2            46.3
2016-09-01 00:02:08                      54.5            45.5
2016-09-01 00:02:38                       NaN             NaN
2016-09-01 00:03:08                       NaN             NaN
2016-09-01 00:03:38                       NaN             NaN
2016-09-01 00:04:08                      54.2            35.5
2016-09-01 00:04:38                       NaN             NaN
2016-09-01 00:05:08                      52.4            45.7
2016-09-01 00:05:38                      53.4            45.2

如果您需要动态地找到您的base值,您可以这样做:

In [21]: x.index[0].second
Out[21]: 8

来自 docs:

base : int, default 0

For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for 5min frequency, base could range from 0 through 4.

Defaults to 0