特定时间戳和前几秒的 df 子集 - python
Subset df on specific timestamps and previous seconds - python
我有一个包含时间戳和单独值的 df。时间戳以毫秒为单位记录(每秒 10 行)。我想对特定时间点 加上 在那一秒内的前几行进行子集化。
使用下面,时间戳已经返回。然后我减去每个的一秒并连接回原来的 df。但是,我希望仅在一秒钟内包含 all 个时间点。然后跳到下一个时间戳和那一秒内的所有时间点。
df = pd.DataFrame({
'Time' : ['2021-03-20 09:27:28.400','2021-03-20 09:29:15.200','2021-03-20 09:30:38.200'],
'Label' : ['A','B','A'],
})
df['Time'] = pd.to_datetime(df['Time'])
df_prev = df.copy()
df_prev['Time'] = df_prev['Time'] - pd.Timedelta('0.9sec')
df_prev = df_prev[['Time']]
df_out = pd.concat([df, df_prev]).sort_values(by = 'Time').reset_index(drop = True)
df_out = (df_out.set_index(['Time', df_out.groupby('Time').cumcount()])
.unstack()
.asfreq('0.1S', method = 'pad')
.stack(dropna = False)
.reset_index(level = 1, drop = True)
.reset_index()
)
预期输出:
Time Label
1 2021-03-20 09:27:27.500 NaN
2 2021-03-20 09:27:27.600 NaN
3 2021-03-20 09:27:27.700 NaN
4 2021-03-20 09:27:27.800 NaN
5 2021-03-20 09:27:27.900 NaN
6 2021-03-20 09:27:28.000 NaN
7 2021-03-20 09:27:28.100 NaN
8 2021-03-20 09:27:28.200 NaN
9 2021-03-20 09:27:28.300 NaN
10 2021-03-20 09:27:28.400 A
11 2021-03-20 09:29:14.300 NaN
12 2021-03-20 09:29:14.400 NaN
13 2021-03-20 09:29:14.500 NaN
14 2021-03-20 09:29:14.600 NaN
15 2021-03-20 09:29:14.700 NaN
16 2021-03-20 09:29:14.800 NaN
17 2021-03-20 09:29:14.900 NaN
18 2021-03-20 09:29:14.000 NaN
19 2021-03-20 09:29:15.100 NaN
20 2021-03-20 09:29:15.200 B
21 2021-03-20 09:30:37.300 NaN
22 2021-03-20 09:30:37.400 NaN
23 2021-03-20 09:30:37.500 NaN
24 2021-03-20 09:30:37.600 NaN
25 2021-03-20 09:30:37.700 NaN
26 2021-03-20 09:30:37.800 NaN
27 2021-03-20 09:30:37.900 NaN
28 2021-03-20 09:30:38.000 NaN
29 2021-03-20 09:30:38.100 NaN
30 2021-03-20 09:30:38.200 A
一种方法是构建一个日期列表,并与原始 df
进行外部合并:
prev = df.Time - pd.Timedelta('900ms')
# build new dates
new_values = pd.concat(pd.date_range(start, end,
periods=10,
name = 'Time').to_series(index=None)
for start, end in zip(prev, df.Time))
new_values.index = range(len(new_values))
df.merge(new_values, on='Time', how='outer', sort = True)
Out[286]:
Time Label
0 2021-03-20 09:27:27.500 NaN
1 2021-03-20 09:27:27.600 NaN
2 2021-03-20 09:27:27.700 NaN
3 2021-03-20 09:27:27.800 NaN
4 2021-03-20 09:27:27.900 NaN
5 2021-03-20 09:27:28.000 NaN
6 2021-03-20 09:27:28.100 NaN
7 2021-03-20 09:27:28.200 NaN
8 2021-03-20 09:27:28.300 NaN
9 2021-03-20 09:27:28.400 A
10 2021-03-20 09:29:14.300 NaN
11 2021-03-20 09:29:14.400 NaN
12 2021-03-20 09:29:14.500 NaN
13 2021-03-20 09:29:14.600 NaN
14 2021-03-20 09:29:14.700 NaN
15 2021-03-20 09:29:14.800 NaN
16 2021-03-20 09:29:14.900 NaN
17 2021-03-20 09:29:15.000 NaN
18 2021-03-20 09:29:15.100 NaN
19 2021-03-20 09:29:15.200 B
20 2021-03-20 09:30:37.300 NaN
21 2021-03-20 09:30:37.400 NaN
22 2021-03-20 09:30:37.500 NaN
23 2021-03-20 09:30:37.600 NaN
24 2021-03-20 09:30:37.700 NaN
25 2021-03-20 09:30:37.800 NaN
26 2021-03-20 09:30:37.900 NaN
27 2021-03-20 09:30:38.000 NaN
28 2021-03-20 09:30:38.100 NaN
29 2021-03-20 09:30:38.200 A
我有一个包含时间戳和单独值的 df。时间戳以毫秒为单位记录(每秒 10 行)。我想对特定时间点 加上 在那一秒内的前几行进行子集化。
使用下面,时间戳已经返回。然后我减去每个的一秒并连接回原来的 df。但是,我希望仅在一秒钟内包含 all 个时间点。然后跳到下一个时间戳和那一秒内的所有时间点。
df = pd.DataFrame({
'Time' : ['2021-03-20 09:27:28.400','2021-03-20 09:29:15.200','2021-03-20 09:30:38.200'],
'Label' : ['A','B','A'],
})
df['Time'] = pd.to_datetime(df['Time'])
df_prev = df.copy()
df_prev['Time'] = df_prev['Time'] - pd.Timedelta('0.9sec')
df_prev = df_prev[['Time']]
df_out = pd.concat([df, df_prev]).sort_values(by = 'Time').reset_index(drop = True)
df_out = (df_out.set_index(['Time', df_out.groupby('Time').cumcount()])
.unstack()
.asfreq('0.1S', method = 'pad')
.stack(dropna = False)
.reset_index(level = 1, drop = True)
.reset_index()
)
预期输出:
Time Label
1 2021-03-20 09:27:27.500 NaN
2 2021-03-20 09:27:27.600 NaN
3 2021-03-20 09:27:27.700 NaN
4 2021-03-20 09:27:27.800 NaN
5 2021-03-20 09:27:27.900 NaN
6 2021-03-20 09:27:28.000 NaN
7 2021-03-20 09:27:28.100 NaN
8 2021-03-20 09:27:28.200 NaN
9 2021-03-20 09:27:28.300 NaN
10 2021-03-20 09:27:28.400 A
11 2021-03-20 09:29:14.300 NaN
12 2021-03-20 09:29:14.400 NaN
13 2021-03-20 09:29:14.500 NaN
14 2021-03-20 09:29:14.600 NaN
15 2021-03-20 09:29:14.700 NaN
16 2021-03-20 09:29:14.800 NaN
17 2021-03-20 09:29:14.900 NaN
18 2021-03-20 09:29:14.000 NaN
19 2021-03-20 09:29:15.100 NaN
20 2021-03-20 09:29:15.200 B
21 2021-03-20 09:30:37.300 NaN
22 2021-03-20 09:30:37.400 NaN
23 2021-03-20 09:30:37.500 NaN
24 2021-03-20 09:30:37.600 NaN
25 2021-03-20 09:30:37.700 NaN
26 2021-03-20 09:30:37.800 NaN
27 2021-03-20 09:30:37.900 NaN
28 2021-03-20 09:30:38.000 NaN
29 2021-03-20 09:30:38.100 NaN
30 2021-03-20 09:30:38.200 A
一种方法是构建一个日期列表,并与原始 df
进行外部合并:
prev = df.Time - pd.Timedelta('900ms')
# build new dates
new_values = pd.concat(pd.date_range(start, end,
periods=10,
name = 'Time').to_series(index=None)
for start, end in zip(prev, df.Time))
new_values.index = range(len(new_values))
df.merge(new_values, on='Time', how='outer', sort = True)
Out[286]:
Time Label
0 2021-03-20 09:27:27.500 NaN
1 2021-03-20 09:27:27.600 NaN
2 2021-03-20 09:27:27.700 NaN
3 2021-03-20 09:27:27.800 NaN
4 2021-03-20 09:27:27.900 NaN
5 2021-03-20 09:27:28.000 NaN
6 2021-03-20 09:27:28.100 NaN
7 2021-03-20 09:27:28.200 NaN
8 2021-03-20 09:27:28.300 NaN
9 2021-03-20 09:27:28.400 A
10 2021-03-20 09:29:14.300 NaN
11 2021-03-20 09:29:14.400 NaN
12 2021-03-20 09:29:14.500 NaN
13 2021-03-20 09:29:14.600 NaN
14 2021-03-20 09:29:14.700 NaN
15 2021-03-20 09:29:14.800 NaN
16 2021-03-20 09:29:14.900 NaN
17 2021-03-20 09:29:15.000 NaN
18 2021-03-20 09:29:15.100 NaN
19 2021-03-20 09:29:15.200 B
20 2021-03-20 09:30:37.300 NaN
21 2021-03-20 09:30:37.400 NaN
22 2021-03-20 09:30:37.500 NaN
23 2021-03-20 09:30:37.600 NaN
24 2021-03-20 09:30:37.700 NaN
25 2021-03-20 09:30:37.800 NaN
26 2021-03-20 09:30:37.900 NaN
27 2021-03-20 09:30:38.000 NaN
28 2021-03-20 09:30:38.100 NaN
29 2021-03-20 09:30:38.200 A