Pandas 对于列中的每个新值，删除以下两行

Question

我有以下数据框：

time   alarm
0       0
1       1
2       0
3       1
4       1
5       1
6       1
7       0
8       0
9       1
10      0

第alarm列表示报警。如果它响了，它取值为 1。
每次闹钟响起，我都想“静音”接下来的两行。然后，如果在静音期后再次响起，我想将接下来的两行静音，依此类推。

换句话说，我想获取以下数据框：

time   alarm    silenced
0       0       no
1       1       no
2       0       yes
3       1       yes
4       1       no
5       1       yes
6       1       yes
7       0       no
8       0       no
9       1       no
10      0       yes

我设法使用 for 循环或 lambda 函数完成了它，但我必须加快计算速度。
有人可以帮我吗？提前致谢！

P.S。由于我最终会删除“沉默”的行，因此直接删除此类行的解决方案也将被接受。在这种情况下，结果应该是：

time   alarm
0       0
1       1
4       1
7       0
8       0
9       1

我在辅助函数中使用 for 循环的尝试：

import numpy as np
import pandas as pd

df = pd.DataFrame({"time":[0,1,2,3,4,5,6,7,8,9,10], "alarm":[0,1,0,1,1,1,1,0,0,1,0]})
df

def fun_silence(df):
    
    # bool: if True,  we are in a "silent" period 
    #       if False, we can consider the current time as a possible alarm
    flag_silent = False
    
    # time of the *last* alarm
    alarm_time = np.nan
    
    # loop over rows
    for index, row in df.iterrows():
        
        # if we are in a silent period
        if flag_silent:
            
            # if 2 time steps passed from the last alarm, we end the silent period
            if row['time'] - alarm_time > 2:
                flag_silent = False
                
            # otherwise, we mark this row as silenced
            else:
                df.at[index, 'silenced'] = 1
          
        # if there is an alarm and we are not in a silent period
        if row['alarm'] == 1 and not flag_silent:
            # save the alarm time
            alarm_time = row['time']
            # enter in a silent period
            flag_silent = True
            
    return df
    
df['silenced'] = 0
df_silenced = fun_silence(df)
df_silenced

Answer 1

我认为你无法避免这个问题中的 for-loop 但你当然可以优化函数然后使用 numba 编译它以在大型数据集上实现类似 C 的速度

from numba import njit

@njit
def silence(alarm):
    count = 0
    for a in alarm:
        if count > 0:
            yield True
            count -= 1
        elif count == 0 and a == 1:
            count = 2
            yield False
        else:
            yield False

    
df['silenced'] = [*silence(df['alarm'].to_numpy())]

    time  alarm  silenced
0      0      0     False
1      1      1     False
2      2      0      True
3      3      1      True
4      4      1     False
5      5      1      True
6      6      1      True
7      7      0     False
8      8      0     False
9      9      1     False
10    10      0      True

Pandas 对于列中的每个新值，删除以下两行

Pandas for each new value in a column, remove the following two rows

python

pandas

dataframe

numpy

fillna