在检测到 Dataframe 中的特定字符后删除特定行(噪声)
Deleting specific row (noise) after detecting specific character in Dataframe
这是我之前问过的问题的延续(
)
但让我再解释一下
我有一个使用 pandas (/python) 的数据框,如下所示:
time_s
wow
lat_deg
lon_deg
0
0.0
0.0
35.042628
-89.978249
1
2.0
0.0
35.042628
-89.978249
2
4.0
0.0
35.042628
-89.978249
3
6.0
0.0
35.042628
-89.978249
4
8.0
1
35.042628
-89.978249
5
10.0
0.0
35.042628
-89.978249
6
12.0
0.0
35.042628
-89.978249
7
14.0
0.0
35.042628
-89.978249
8
16.0
1
35.042628
-89.978249
9
18.0
1
35.042628
-89.978249
10
20.0
0.0
35.042628
-89.978249
11
22.0
0.0
35.042628
-89.978249
...
...
...
...
...
在 wow
列中,它被定义为具有 0 和 1 的值。不幸的是,我拥有的数据有一些噪音,使得整个实体(行中)超过应该是(其实是500条数据,但是由于一些噪音,检测为507条数据)
因此,我打算在处理之前删除
原始数据是这样的
(...,0,0,0,0,1,1,1,0,1,1, 1,1,0,0,0,0,...)
我需要通过删除 1 (1,0,1) 之间的“0”值来 trim 数据,这样它将变成
(...,0,0,0,0,1,1,1,1,1,1, 1,0,0,0,0,...)
我该怎么做?
您可以尝试将 shift 与掩码数据框一起使用:
df[(df['wow'] == 0) & ((df['wow'].shift(1) ==1) | (df['wow'].shift(-1) ==1))]
输出:
time_s wow lat_deg lon_deg
3 6.0 0.0 35.042628 -89.978249
5 10.0 0.0 35.042628 -89.978249
7 14.0 0.0 35.042628 -89.978249
10 20.0 0.0 35.042628 -89.978249
假设你只需要摆脱一对 1
立即包围的 0
(即只考虑 1 0 1
系列),你可以设置一个布尔掩码并使用.loc
过滤行,如下:
m = (df['wow'] == 0.0) & (df['wow'].shift(1) == 1.0) & (df['wow'].shift(-1) == 1.0)
df[~m]
演示
由于您的示例数据没有这种情况,我对数据进行了一些修改:
print(df)
time_s wow lat_deg lon_deg
0 0.0 0.0 35.042628 -89.978249
1 2.0 0.0 35.042628 -89.978249
2 4.0 0.0 35.042628 -89.978249
3 6.0 0.0 35.042628 -89.978249
4 8.0 1.0 35.042628 -89.978249
5 10.0 0.0 35.042628 -89.978249 <== Matching entry to get rid of
6 12.0 1.0 35.042628 -89.978249
7 14.0 0.0 35.042628 -89.978249 <== Matching entry to get rid of
8 16.0 1.0 35.042628 -89.978249
9 18.0 1.0 35.042628 -89.978249
10 20.0 0.0 35.042628 -89.978249
11 22.0 0.0 35.042628 -89.978249
m = (df['wow'] == 0.0) & (df['wow'].shift(1) == 1.0) & (df['wow'].shift(-1) == 1.0)
df[~m]
time_s wow lat_deg lon_deg
0 0.0 0.0 35.042628 -89.978249
1 2.0 0.0 35.042628 -89.978249
2 4.0 0.0 35.042628 -89.978249
3 6.0 0.0 35.042628 -89.978249
4 8.0 1.0 35.042628 -89.978249
6 12.0 1.0 35.042628 -89.978249
8 16.0 1.0 35.042628 -89.978249
9 18.0 1.0 35.042628 -89.978249
10 20.0 0.0 35.042628 -89.978249
11 22.0 0.0 35.042628 -89.978249
这是我之前问过的问题的延续(
但让我再解释一下
我有一个使用 pandas (/python) 的数据框,如下所示:
time_s | wow | lat_deg | lon_deg | |
---|---|---|---|---|
0 | 0.0 | 0.0 | 35.042628 | -89.978249 |
1 | 2.0 | 0.0 | 35.042628 | -89.978249 |
2 | 4.0 | 0.0 | 35.042628 | -89.978249 |
3 | 6.0 | 0.0 | 35.042628 | -89.978249 |
4 | 8.0 | 1 | 35.042628 | -89.978249 |
5 | 10.0 | 0.0 | 35.042628 | -89.978249 |
6 | 12.0 | 0.0 | 35.042628 | -89.978249 |
7 | 14.0 | 0.0 | 35.042628 | -89.978249 |
8 | 16.0 | 1 | 35.042628 | -89.978249 |
9 | 18.0 | 1 | 35.042628 | -89.978249 |
10 | 20.0 | 0.0 | 35.042628 | -89.978249 |
11 | 22.0 | 0.0 | 35.042628 | -89.978249 |
... | ... | ... | ... | ... |
在 wow
列中,它被定义为具有 0 和 1 的值。不幸的是,我拥有的数据有一些噪音,使得整个实体(行中)超过应该是(其实是500条数据,但是由于一些噪音,检测为507条数据)
因此,我打算在处理之前删除
原始数据是这样的
(...,0,0,0,0,1,1,1,0,1,1, 1,1,0,0,0,0,...)
我需要通过删除 1 (1,0,1) 之间的“0”值来 trim 数据,这样它将变成
(...,0,0,0,0,1,1,1,1,1,1, 1,0,0,0,0,...)
我该怎么做?
您可以尝试将 shift 与掩码数据框一起使用:
df[(df['wow'] == 0) & ((df['wow'].shift(1) ==1) | (df['wow'].shift(-1) ==1))]
输出:
time_s wow lat_deg lon_deg
3 6.0 0.0 35.042628 -89.978249
5 10.0 0.0 35.042628 -89.978249
7 14.0 0.0 35.042628 -89.978249
10 20.0 0.0 35.042628 -89.978249
假设你只需要摆脱一对 1
立即包围的 0
(即只考虑 1 0 1
系列),你可以设置一个布尔掩码并使用.loc
过滤行,如下:
m = (df['wow'] == 0.0) & (df['wow'].shift(1) == 1.0) & (df['wow'].shift(-1) == 1.0)
df[~m]
演示
由于您的示例数据没有这种情况,我对数据进行了一些修改:
print(df)
time_s wow lat_deg lon_deg
0 0.0 0.0 35.042628 -89.978249
1 2.0 0.0 35.042628 -89.978249
2 4.0 0.0 35.042628 -89.978249
3 6.0 0.0 35.042628 -89.978249
4 8.0 1.0 35.042628 -89.978249
5 10.0 0.0 35.042628 -89.978249 <== Matching entry to get rid of
6 12.0 1.0 35.042628 -89.978249
7 14.0 0.0 35.042628 -89.978249 <== Matching entry to get rid of
8 16.0 1.0 35.042628 -89.978249
9 18.0 1.0 35.042628 -89.978249
10 20.0 0.0 35.042628 -89.978249
11 22.0 0.0 35.042628 -89.978249
m = (df['wow'] == 0.0) & (df['wow'].shift(1) == 1.0) & (df['wow'].shift(-1) == 1.0)
df[~m]
time_s wow lat_deg lon_deg
0 0.0 0.0 35.042628 -89.978249
1 2.0 0.0 35.042628 -89.978249
2 4.0 0.0 35.042628 -89.978249
3 6.0 0.0 35.042628 -89.978249
4 8.0 1.0 35.042628 -89.978249
6 12.0 1.0 35.042628 -89.978249
8 16.0 1.0 35.042628 -89.978249
9 18.0 1.0 35.042628 -89.978249
10 20.0 0.0 35.042628 -89.978249
11 22.0 0.0 35.042628 -89.978249