在检测到 Dataframe 中的特定字符后删除特定行(噪声)

Deleting specific row (noise) after detecting specific character in Dataframe

这是我之前问过的问题的延续( )

但让我再解释一下

我有一个使用 pandas (/python) 的数据框,如下所示:

time_s wow lat_deg lon_deg
0 0.0 0.0 35.042628 -89.978249
1 2.0 0.0 35.042628 -89.978249
2 4.0 0.0 35.042628 -89.978249
3 6.0 0.0 35.042628 -89.978249
4 8.0 1 35.042628 -89.978249
5 10.0 0.0 35.042628 -89.978249
6 12.0 0.0 35.042628 -89.978249
7 14.0 0.0 35.042628 -89.978249
8 16.0 1 35.042628 -89.978249
9 18.0 1 35.042628 -89.978249
10 20.0 0.0 35.042628 -89.978249
11 22.0 0.0 35.042628 -89.978249
... ... ... ... ...

wow 列中,它被定义为具有 0 和 1 的值。不幸的是,我拥有的数据有一些噪音,使得整个实体(行中)超过应该是(其实是500条数据,但是由于一些噪音,检测为507条数据)

因此,我打算在处理之前删除

原始数据是这样的

(...,0,0,0,0,1,1,1,0,1,1, 1,1,0,0,0,0,...)

我需要通过删除 1 (1,0,1) 之间的“0”值来 trim 数据,这样它将变成

(...,0,0,0,0,1,1,1,1,1,1, 1,0,0,0,0,...)

我该怎么做?

您可以尝试将 shift 与掩码数据框一起使用:

df[(df['wow'] == 0) & ((df['wow'].shift(1) ==1) | (df['wow'].shift(-1) ==1))]

输出:

    time_s  wow lat_deg     lon_deg
3   6.0     0.0 35.042628   -89.978249
5   10.0    0.0 35.042628   -89.978249
7   14.0    0.0 35.042628   -89.978249
10  20.0    0.0 35.042628   -89.978249

假设你只需要摆脱一对 1 立即包围的 0 (即只考虑 1 0 1 系列),你可以设置一个布尔掩码并使用.loc过滤行,如下:

m = (df['wow'] == 0.0) & (df['wow'].shift(1) == 1.0) & (df['wow'].shift(-1) == 1.0)
df[~m]

演示

由于您的示例数据没有这种情况,我对数据进行了一些修改:

print(df)


    time_s  wow    lat_deg    lon_deg
0      0.0  0.0  35.042628 -89.978249
1      2.0  0.0  35.042628 -89.978249
2      4.0  0.0  35.042628 -89.978249
3      6.0  0.0  35.042628 -89.978249
4      8.0  1.0  35.042628 -89.978249
5     10.0  0.0  35.042628 -89.978249          <== Matching entry to get rid of 
6     12.0  1.0  35.042628 -89.978249
7     14.0  0.0  35.042628 -89.978249          <== Matching entry to get rid of 
8     16.0  1.0  35.042628 -89.978249
9     18.0  1.0  35.042628 -89.978249
10    20.0  0.0  35.042628 -89.978249
11    22.0  0.0  35.042628 -89.978249


m = (df['wow'] == 0.0) & (df['wow'].shift(1) == 1.0) & (df['wow'].shift(-1) == 1.0)
df[~m]


    time_s  wow    lat_deg    lon_deg
0      0.0  0.0  35.042628 -89.978249
1      2.0  0.0  35.042628 -89.978249
2      4.0  0.0  35.042628 -89.978249
3      6.0  0.0  35.042628 -89.978249
4      8.0  1.0  35.042628 -89.978249
6     12.0  1.0  35.042628 -89.978249
8     16.0  1.0  35.042628 -89.978249
9     18.0  1.0  35.042628 -89.978249
10    20.0  0.0  35.042628 -89.978249
11    22.0  0.0  35.042628 -89.978249