上采样时间,同时将原始点值保持在 python
upsample time while keeping original points values in python
我有一个任意大小的间隔不均匀的时间序列(例如 7),见下文,我想对其进行上采样 (50) 并保留原始点。
ls = ['2016-01-30 12:10:00',
'2016-01-30 12:23:35',
'2016-01-30 12:24:14',
'2016-01-30 12:24:51',
'2016-01-30 12:25:00',
'2016-01-30 12:26:49',
'2016-01-30 12:27:36']
新时间戳存储在哪个点之间基本上无关紧要,但最好根据时间间隔存储。所以时间间隔越大,新生成的时间戳就越多,例如
new_ls = ['2016-01-30 12:10:00',
x,
x,
x,
x,
'2016-01-30 12:23:35',
x,
x,
'2016-01-30 12:24:14',
'2016-01-30 12:24:51',
'2016-01-30 12:25:00',
x,
'2016-01-30 12:26:49',
'2016-01-30 12:27:36']
结果列表的间距也可能不均匀。
提前致谢
首先将您的列表转换为 pd.DatetimeIndex
并找到开始和结束之间时间戳的理想分布:
import pandas as pd
import numpy as np
ls = pd.to_datetime(['2016-01-30 12:10:00',
'2016-01-30 12:23:35',
'2016-01-30 12:24:14',
'2016-01-30 12:24:51',
'2016-01-30 12:25:00',
'2016-01-30 12:26:49',
'2016-01-30 12:27:36'])
n = 50
dt = (ls.max() - ls.min())/n
ls_temp = pd.date_range(start=ls.min(), end=ls.max(), freq=dt)
然后找出将原始测量值放在哪里,例如替换绝对差值最小的条目:
idx = np.abs(ls[:,None] - ls_temp[None, :]).argmin(axis=1)
ls_temp = pd.Series(ls_temp)
ls_temp[idx] = pd.Series(ls)
请注意,此策略在某些边缘情况下可能会失败(其中一个索引被两个值或类似的东西替换)。
最后,您可以将其转换回您的格式,例如:
ls = list(map("{:%Y-%m-%d %H:%M:%S}".format, ls_temp.tolist()))
输出:
['2016-01-30 12:10:00',
'2016-01-30 12:10:21',
'2016-01-30 12:10:42',
'2016-01-30 12:11:03',
'2016-01-30 12:11:24',
'2016-01-30 12:11:45',
'2016-01-30 12:12:06',
'2016-01-30 12:12:27',
'2016-01-30 12:12:48',
'2016-01-30 12:13:10',
'2016-01-30 12:13:31',
'2016-01-30 12:13:52',
'2016-01-30 12:14:13',
'2016-01-30 12:14:34',
'2016-01-30 12:14:55',
'2016-01-30 12:15:16',
'2016-01-30 12:15:37',
'2016-01-30 12:15:59',
'2016-01-30 12:16:20',
'2016-01-30 12:16:41',
'2016-01-30 12:17:02',
'2016-01-30 12:17:23',
'2016-01-30 12:17:44',
'2016-01-30 12:18:05',
'2016-01-30 12:18:26',
'2016-01-30 12:18:48',
'2016-01-30 12:19:09',
'2016-01-30 12:19:30',
'2016-01-30 12:19:51',
'2016-01-30 12:20:12',
'2016-01-30 12:20:33',
'2016-01-30 12:20:54',
'2016-01-30 12:21:15',
'2016-01-30 12:21:36',
'2016-01-30 12:21:58',
'2016-01-30 12:22:19',
'2016-01-30 12:22:40',
'2016-01-30 12:23:01',
'2016-01-30 12:23:22',
'2016-01-30 12:23:35',
'2016-01-30 12:24:14',
'2016-01-30 12:24:25',
'2016-01-30 12:24:51',
'2016-01-30 12:25:00',
'2016-01-30 12:25:29',
'2016-01-30 12:25:50',
'2016-01-30 12:26:11',
'2016-01-30 12:26:32',
'2016-01-30 12:26:49',
'2016-01-30 12:27:14',
'2016-01-30 12:27:36']
但是,如果您想对与 time-stamps 相关的数据重新采样,最好将其保留为 pd.Series
并跳过最后一步。最后你可能想要使用 pd.DataFrame.resample
,无论如何。
我有一个任意大小的间隔不均匀的时间序列(例如 7),见下文,我想对其进行上采样 (50) 并保留原始点。
ls = ['2016-01-30 12:10:00',
'2016-01-30 12:23:35',
'2016-01-30 12:24:14',
'2016-01-30 12:24:51',
'2016-01-30 12:25:00',
'2016-01-30 12:26:49',
'2016-01-30 12:27:36']
新时间戳存储在哪个点之间基本上无关紧要,但最好根据时间间隔存储。所以时间间隔越大,新生成的时间戳就越多,例如
new_ls = ['2016-01-30 12:10:00',
x,
x,
x,
x,
'2016-01-30 12:23:35',
x,
x,
'2016-01-30 12:24:14',
'2016-01-30 12:24:51',
'2016-01-30 12:25:00',
x,
'2016-01-30 12:26:49',
'2016-01-30 12:27:36']
结果列表的间距也可能不均匀。 提前致谢
首先将您的列表转换为 pd.DatetimeIndex
并找到开始和结束之间时间戳的理想分布:
import pandas as pd
import numpy as np
ls = pd.to_datetime(['2016-01-30 12:10:00',
'2016-01-30 12:23:35',
'2016-01-30 12:24:14',
'2016-01-30 12:24:51',
'2016-01-30 12:25:00',
'2016-01-30 12:26:49',
'2016-01-30 12:27:36'])
n = 50
dt = (ls.max() - ls.min())/n
ls_temp = pd.date_range(start=ls.min(), end=ls.max(), freq=dt)
然后找出将原始测量值放在哪里,例如替换绝对差值最小的条目:
idx = np.abs(ls[:,None] - ls_temp[None, :]).argmin(axis=1)
ls_temp = pd.Series(ls_temp)
ls_temp[idx] = pd.Series(ls)
请注意,此策略在某些边缘情况下可能会失败(其中一个索引被两个值或类似的东西替换)。
最后,您可以将其转换回您的格式,例如:
ls = list(map("{:%Y-%m-%d %H:%M:%S}".format, ls_temp.tolist()))
输出:
['2016-01-30 12:10:00',
'2016-01-30 12:10:21',
'2016-01-30 12:10:42',
'2016-01-30 12:11:03',
'2016-01-30 12:11:24',
'2016-01-30 12:11:45',
'2016-01-30 12:12:06',
'2016-01-30 12:12:27',
'2016-01-30 12:12:48',
'2016-01-30 12:13:10',
'2016-01-30 12:13:31',
'2016-01-30 12:13:52',
'2016-01-30 12:14:13',
'2016-01-30 12:14:34',
'2016-01-30 12:14:55',
'2016-01-30 12:15:16',
'2016-01-30 12:15:37',
'2016-01-30 12:15:59',
'2016-01-30 12:16:20',
'2016-01-30 12:16:41',
'2016-01-30 12:17:02',
'2016-01-30 12:17:23',
'2016-01-30 12:17:44',
'2016-01-30 12:18:05',
'2016-01-30 12:18:26',
'2016-01-30 12:18:48',
'2016-01-30 12:19:09',
'2016-01-30 12:19:30',
'2016-01-30 12:19:51',
'2016-01-30 12:20:12',
'2016-01-30 12:20:33',
'2016-01-30 12:20:54',
'2016-01-30 12:21:15',
'2016-01-30 12:21:36',
'2016-01-30 12:21:58',
'2016-01-30 12:22:19',
'2016-01-30 12:22:40',
'2016-01-30 12:23:01',
'2016-01-30 12:23:22',
'2016-01-30 12:23:35',
'2016-01-30 12:24:14',
'2016-01-30 12:24:25',
'2016-01-30 12:24:51',
'2016-01-30 12:25:00',
'2016-01-30 12:25:29',
'2016-01-30 12:25:50',
'2016-01-30 12:26:11',
'2016-01-30 12:26:32',
'2016-01-30 12:26:49',
'2016-01-30 12:27:14',
'2016-01-30 12:27:36']
但是,如果您想对与 time-stamps 相关的数据重新采样,最好将其保留为 pd.Series
并跳过最后一步。最后你可能想要使用 pd.DataFrame.resample
,无论如何。