Pandas 对于列中的每个新值,删除以下两行
Pandas for each new value in a column, remove the following two rows
我有以下数据框:
time alarm
0 0
1 1
2 0
3 1
4 1
5 1
6 1
7 0
8 0
9 1
10 0
第alarm
列表示报警。如果它响了,它取值为 1。
每次闹钟响起,我都想“静音”接下来的两行。然后,如果在静音期后再次响起,我想将接下来的两行静音,依此类推。
换句话说,我想获取以下数据框:
time alarm silenced
0 0 no
1 1 no
2 0 yes
3 1 yes
4 1 no
5 1 yes
6 1 yes
7 0 no
8 0 no
9 1 no
10 0 yes
我设法使用 for 循环或 lambda 函数完成了它,但我必须加快计算速度。
有人可以帮我吗?提前致谢!
P.S。
由于我最终会删除“沉默”的行,因此直接删除此类行的解决方案也将被接受。在这种情况下,结果应该是:
time alarm
0 0
1 1
4 1
7 0
8 0
9 1
我在辅助函数中使用 for 循环的尝试:
import numpy as np
import pandas as pd
df = pd.DataFrame({"time":[0,1,2,3,4,5,6,7,8,9,10], "alarm":[0,1,0,1,1,1,1,0,0,1,0]})
df
def fun_silence(df):
# bool: if True, we are in a "silent" period
# if False, we can consider the current time as a possible alarm
flag_silent = False
# time of the *last* alarm
alarm_time = np.nan
# loop over rows
for index, row in df.iterrows():
# if we are in a silent period
if flag_silent:
# if 2 time steps passed from the last alarm, we end the silent period
if row['time'] - alarm_time > 2:
flag_silent = False
# otherwise, we mark this row as silenced
else:
df.at[index, 'silenced'] = 1
# if there is an alarm and we are not in a silent period
if row['alarm'] == 1 and not flag_silent:
# save the alarm time
alarm_time = row['time']
# enter in a silent period
flag_silent = True
return df
df['silenced'] = 0
df_silenced = fun_silence(df)
df_silenced
我认为你无法避免这个问题中的 for-loop 但你当然可以优化函数然后使用 numba 编译它以在大型数据集上实现类似 C 的速度
from numba import njit
@njit
def silence(alarm):
count = 0
for a in alarm:
if count > 0:
yield True
count -= 1
elif count == 0 and a == 1:
count = 2
yield False
else:
yield False
df['silenced'] = [*silence(df['alarm'].to_numpy())]
time alarm silenced
0 0 0 False
1 1 1 False
2 2 0 True
3 3 1 True
4 4 1 False
5 5 1 True
6 6 1 True
7 7 0 False
8 8 0 False
9 9 1 False
10 10 0 True
我有以下数据框:
time alarm
0 0
1 1
2 0
3 1
4 1
5 1
6 1
7 0
8 0
9 1
10 0
第alarm
列表示报警。如果它响了,它取值为 1。
每次闹钟响起,我都想“静音”接下来的两行。然后,如果在静音期后再次响起,我想将接下来的两行静音,依此类推。
换句话说,我想获取以下数据框:
time alarm silenced
0 0 no
1 1 no
2 0 yes
3 1 yes
4 1 no
5 1 yes
6 1 yes
7 0 no
8 0 no
9 1 no
10 0 yes
我设法使用 for 循环或 lambda 函数完成了它,但我必须加快计算速度。
有人可以帮我吗?提前致谢!
P.S。 由于我最终会删除“沉默”的行,因此直接删除此类行的解决方案也将被接受。在这种情况下,结果应该是:
time alarm
0 0
1 1
4 1
7 0
8 0
9 1
我在辅助函数中使用 for 循环的尝试:
import numpy as np
import pandas as pd
df = pd.DataFrame({"time":[0,1,2,3,4,5,6,7,8,9,10], "alarm":[0,1,0,1,1,1,1,0,0,1,0]})
df
def fun_silence(df):
# bool: if True, we are in a "silent" period
# if False, we can consider the current time as a possible alarm
flag_silent = False
# time of the *last* alarm
alarm_time = np.nan
# loop over rows
for index, row in df.iterrows():
# if we are in a silent period
if flag_silent:
# if 2 time steps passed from the last alarm, we end the silent period
if row['time'] - alarm_time > 2:
flag_silent = False
# otherwise, we mark this row as silenced
else:
df.at[index, 'silenced'] = 1
# if there is an alarm and we are not in a silent period
if row['alarm'] == 1 and not flag_silent:
# save the alarm time
alarm_time = row['time']
# enter in a silent period
flag_silent = True
return df
df['silenced'] = 0
df_silenced = fun_silence(df)
df_silenced
我认为你无法避免这个问题中的 for-loop 但你当然可以优化函数然后使用 numba 编译它以在大型数据集上实现类似 C 的速度
from numba import njit
@njit
def silence(alarm):
count = 0
for a in alarm:
if count > 0:
yield True
count -= 1
elif count == 0 and a == 1:
count = 2
yield False
else:
yield False
df['silenced'] = [*silence(df['alarm'].to_numpy())]
time alarm silenced
0 0 0 False
1 1 1 False
2 2 0 True
3 3 1 True
4 4 1 False
5 5 1 True
6 6 1 True
7 7 0 False
8 8 0 False
9 9 1 False
10 10 0 True