特定时间戳和前几秒的 df 子集 - python

Subset df on specific timestamps and previous seconds - python

我有一个包含时间戳和单独值的 df。时间戳以毫秒为单位记录(每秒 10 行)。我想对特定时间点 加上 在那一秒内的前几行进行子集化。

使用下面,时间戳已经返回。然后我减去每个的一秒并连接回原来的 df。但是,我希望仅在一秒钟内包含 all 个时间点。然后跳到下一个时间戳和那一秒内的所有时间点。

df = pd.DataFrame({      
    'Time' : ['2021-03-20 09:27:28.400','2021-03-20 09:29:15.200','2021-03-20 09:30:38.200'],
    'Label' : ['A','B','A'],   
   })

df['Time'] = pd.to_datetime(df['Time'])

df_prev = df.copy()

df_prev['Time'] = df_prev['Time'] - pd.Timedelta('0.9sec')

df_prev = df_prev[['Time']]

df_out = pd.concat([df, df_prev]).sort_values(by = 'Time').reset_index(drop = True)

df_out = (df_out.set_index(['Time', df_out.groupby('Time').cumcount()])
            .unstack()
            .asfreq('0.1S', method = 'pad')
            .stack(dropna = False) 
            .reset_index(level = 1, drop = True)
            .reset_index()
            )

预期输出:

                      Time Label
1  2021-03-20 09:27:27.500   NaN
2  2021-03-20 09:27:27.600   NaN
3  2021-03-20 09:27:27.700   NaN
4  2021-03-20 09:27:27.800   NaN
5  2021-03-20 09:27:27.900   NaN
6  2021-03-20 09:27:28.000   NaN
7  2021-03-20 09:27:28.100   NaN
8  2021-03-20 09:27:28.200   NaN
9  2021-03-20 09:27:28.300   NaN
10 2021-03-20 09:27:28.400     A
11 2021-03-20 09:29:14.300   NaN
12 2021-03-20 09:29:14.400   NaN
13 2021-03-20 09:29:14.500   NaN
14 2021-03-20 09:29:14.600   NaN
15 2021-03-20 09:29:14.700   NaN
16 2021-03-20 09:29:14.800   NaN
17 2021-03-20 09:29:14.900   NaN
18 2021-03-20 09:29:14.000   NaN
19 2021-03-20 09:29:15.100   NaN
20 2021-03-20 09:29:15.200     B
21 2021-03-20 09:30:37.300   NaN
22 2021-03-20 09:30:37.400   NaN
23 2021-03-20 09:30:37.500   NaN
24 2021-03-20 09:30:37.600   NaN
25 2021-03-20 09:30:37.700   NaN
26 2021-03-20 09:30:37.800   NaN
27 2021-03-20 09:30:37.900   NaN
28 2021-03-20 09:30:38.000   NaN
29 2021-03-20 09:30:38.100   NaN
30 2021-03-20 09:30:38.200     A

一种方法是构建一个日期列表,并与原始 df 进行外部合并:

prev = df.Time - pd.Timedelta('900ms')

# build new dates
new_values = pd.concat(pd.date_range(start, end, 
                                     periods=10, 
                                     name = 'Time').to_series(index=None) 
                        for start, end in zip(prev, df.Time))

 new_values.index = range(len(new_values))

 df.merge(new_values, on='Time', how='outer', sort = True)
Out[286]:
                      Time Label
0  2021-03-20 09:27:27.500   NaN
1  2021-03-20 09:27:27.600   NaN
2  2021-03-20 09:27:27.700   NaN
3  2021-03-20 09:27:27.800   NaN
4  2021-03-20 09:27:27.900   NaN
5  2021-03-20 09:27:28.000   NaN
6  2021-03-20 09:27:28.100   NaN
7  2021-03-20 09:27:28.200   NaN
8  2021-03-20 09:27:28.300   NaN
9  2021-03-20 09:27:28.400     A
10 2021-03-20 09:29:14.300   NaN
11 2021-03-20 09:29:14.400   NaN
12 2021-03-20 09:29:14.500   NaN
13 2021-03-20 09:29:14.600   NaN
14 2021-03-20 09:29:14.700   NaN
15 2021-03-20 09:29:14.800   NaN
16 2021-03-20 09:29:14.900   NaN
17 2021-03-20 09:29:15.000   NaN
18 2021-03-20 09:29:15.100   NaN
19 2021-03-20 09:29:15.200     B
20 2021-03-20 09:30:37.300   NaN
21 2021-03-20 09:30:37.400   NaN
22 2021-03-20 09:30:37.500   NaN
23 2021-03-20 09:30:37.600   NaN
24 2021-03-20 09:30:37.700   NaN
25 2021-03-20 09:30:37.800   NaN
26 2021-03-20 09:30:37.900   NaN
27 2021-03-20 09:30:38.000   NaN
28 2021-03-20 09:30:38.100   NaN
29 2021-03-20 09:30:38.200     A