使用 pandas/numpy 按 bin 边界平滑
Smoothing by bin boundaries using pandas/numpy
我已经使用 pandas.cut 函数形成了垃圾箱。现在,为了按 bin 边界执行平滑,我使用 groupby 函数
计算每个 bin 的最小值和最大值
最小值
date births with noise
bin
A 1959-01-31 23 19.921049
B 1959-01-02 27 25.921175
C 1959-01-01 30 32.064698
D 1959-01-08 35 38.507170
E 1959-01-05 41 45.022163
F 1959-01-13 47 51.821755
G 1959-03-27 56 59.416700
H 1959-09-23 73 70.140119
最大值-
date births with noise
bin
A 1959-07-12 30 25.161292
B 1959-12-11 35 31.738422
C 1959-12-27 42 38.447807
D 1959-12-20 48 44.919703
E 1959-12-31 56 51.274550
F 1959-12-30 59 57.515927
G 1959-11-05 68 63.970382
H 1959-09-23 73 70.140119
现在我想替换原始数据框中的值。如果该值小于(其 bin 的)平均值,则将其替换为最小值(对于该 bin),如果大于平均值,则将其替换为最大值。
我的数据框看起来像这样-
date births with noise bin smooth_val_mean
0 1959-01-01 35 36.964692 C 35.461173
1 1959-01-02 32 29.861393 B 29.592061
2 1959-01-03 30 27.268515 B 29.592061
3 1959-01-04 31 31.513148 B 29.592061
4 1959-01-05 44 46.194690 E 47.850101
我应该如何使用 pandas/numpy 执行此操作?
让我们试试这个功能:
def thresh(col):
means = df['bin'].replace(df_mean[col])
mins = df['bin'].replace(df_min[col])
maxs = df['bin'].replace(df_max[col])
signs = np.signs(df[col] - means)
df[f'{col}_smooth'] = np.select((signs==1, signs==-1), (maxs, mins), means)
for col in ['with noise']:
thresh(col)
我已经使用 pandas.cut 函数形成了垃圾箱。现在,为了按 bin 边界执行平滑,我使用 groupby 函数
计算每个 bin 的最小值和最大值
最小值
date births with noise
bin
A 1959-01-31 23 19.921049
B 1959-01-02 27 25.921175
C 1959-01-01 30 32.064698
D 1959-01-08 35 38.507170
E 1959-01-05 41 45.022163
F 1959-01-13 47 51.821755
G 1959-03-27 56 59.416700
H 1959-09-23 73 70.140119
最大值-
date births with noise
bin
A 1959-07-12 30 25.161292
B 1959-12-11 35 31.738422
C 1959-12-27 42 38.447807
D 1959-12-20 48 44.919703
E 1959-12-31 56 51.274550
F 1959-12-30 59 57.515927
G 1959-11-05 68 63.970382
H 1959-09-23 73 70.140119
现在我想替换原始数据框中的值。如果该值小于(其 bin 的)平均值,则将其替换为最小值(对于该 bin),如果大于平均值,则将其替换为最大值。
我的数据框看起来像这样-
date births with noise bin smooth_val_mean
0 1959-01-01 35 36.964692 C 35.461173
1 1959-01-02 32 29.861393 B 29.592061
2 1959-01-03 30 27.268515 B 29.592061
3 1959-01-04 31 31.513148 B 29.592061
4 1959-01-05 44 46.194690 E 47.850101
我应该如何使用 pandas/numpy 执行此操作?
让我们试试这个功能:
def thresh(col):
means = df['bin'].replace(df_mean[col])
mins = df['bin'].replace(df_min[col])
maxs = df['bin'].replace(df_max[col])
signs = np.signs(df[col] - means)
df[f'{col}_smooth'] = np.select((signs==1, signs==-1), (maxs, mins), means)
for col in ['with noise']:
thresh(col)