qcut 中 bin 的对称数量在零附近
Symmetric number of bins in qcut around zero
我有一个 pandas 数据框,每行有不同数量的整数和 NaNs
。我想将每行中的值分配到 8 个箱子中 - 每行 4 个箱子用于负值,4 个箱子用于正值。因此,每行的每个 bin 中将有不同数量的值。关于如何为此调整 qcut
功能的任何提示?谢谢!
如果我没理解错的话,你可以只对正值做 qcut
,对负值做 qcut
。
例如,给定数据帧:
>>> df
vals
0 -0.456460
1 0.448368
2 0.186750
3 1.056617
4 -0.035620
5 -0.609843
6 0.126376
7 0.160817
8 -1.495441
9 0.730763
10 -0.005071
11 0.677918
12 -0.779553
13 0.717374
14 2.250258
15 -0.801028
16 0.306408
17 0.538970
18 -2.120528
19 1.066903
使用 2 qcuts
,一个为正,一个为负。
df.loc[df.vals > 0,'bin'] = pd.qcut(df.loc[df.vals > 0,'vals'], q=4)
df.loc[df.vals < 0,'bin'] = pd.qcut(df.loc[df.vals < 0,'vals'], q=4)
因此,它们被分为 8 个独特的箱子,4 个用于阳性,4 个用于阴性:
>>> df
vals bin
0 -0.456460 (-0.695, -0.351]
1 0.448368 (0.276, 0.608]
2 0.186750 (0.125, 0.276]
3 1.056617 (0.812, 2.25]
4 -0.035620 (-0.351, -0.00507]
5 -0.609843 (-0.695, -0.351]
6 0.126376 (0.125, 0.276]
7 0.160817 (0.125, 0.276]
8 -1.495441 (-2.122, -0.975]
9 0.730763 (0.608, 0.812]
10 -0.005071 (-0.351, -0.00507]
11 0.677918 (0.608, 0.812]
12 -0.779553 (-0.975, -0.695]
13 0.717374 (0.608, 0.812]
14 2.250258 (0.812, 2.25]
15 -0.801028 (-0.975, -0.695]
16 0.306408 (0.276, 0.608]
17 0.538970 (0.276, 0.608]
18 -2.120528 (-2.122, -0.975]
19 1.066903 (0.812, 2.25]
您可以对 bin 进行排序以像这样可视化它们,这样您就可以看到 4 个正值 bin 和 4 个负值 bin:
np.sort(df['bin'].unique())
array([Interval(-2.1219999999999999, -0.97499999999999998, closed='right'),
Interval(-0.97499999999999998, -0.69499999999999995, closed='right'),
Interval(-0.69499999999999995, -0.35099999999999998, closed='right'),
Interval(-0.35099999999999998, -0.0050699999999999999, closed='right'),
Interval(0.125, 0.27600000000000002, closed='right'),
Interval(0.27600000000000002, 0.60799999999999998, closed='right'),
Interval(0.60799999999999998, 0.81200000000000006, closed='right'),
Interval(0.81200000000000006, 2.25, closed='right')], dtype=object)
我有一个 pandas 数据框,每行有不同数量的整数和 NaNs
。我想将每行中的值分配到 8 个箱子中 - 每行 4 个箱子用于负值,4 个箱子用于正值。因此,每行的每个 bin 中将有不同数量的值。关于如何为此调整 qcut
功能的任何提示?谢谢!
如果我没理解错的话,你可以只对正值做 qcut
,对负值做 qcut
。
例如,给定数据帧:
>>> df
vals
0 -0.456460
1 0.448368
2 0.186750
3 1.056617
4 -0.035620
5 -0.609843
6 0.126376
7 0.160817
8 -1.495441
9 0.730763
10 -0.005071
11 0.677918
12 -0.779553
13 0.717374
14 2.250258
15 -0.801028
16 0.306408
17 0.538970
18 -2.120528
19 1.066903
使用 2 qcuts
,一个为正,一个为负。
df.loc[df.vals > 0,'bin'] = pd.qcut(df.loc[df.vals > 0,'vals'], q=4)
df.loc[df.vals < 0,'bin'] = pd.qcut(df.loc[df.vals < 0,'vals'], q=4)
因此,它们被分为 8 个独特的箱子,4 个用于阳性,4 个用于阴性:
>>> df
vals bin
0 -0.456460 (-0.695, -0.351]
1 0.448368 (0.276, 0.608]
2 0.186750 (0.125, 0.276]
3 1.056617 (0.812, 2.25]
4 -0.035620 (-0.351, -0.00507]
5 -0.609843 (-0.695, -0.351]
6 0.126376 (0.125, 0.276]
7 0.160817 (0.125, 0.276]
8 -1.495441 (-2.122, -0.975]
9 0.730763 (0.608, 0.812]
10 -0.005071 (-0.351, -0.00507]
11 0.677918 (0.608, 0.812]
12 -0.779553 (-0.975, -0.695]
13 0.717374 (0.608, 0.812]
14 2.250258 (0.812, 2.25]
15 -0.801028 (-0.975, -0.695]
16 0.306408 (0.276, 0.608]
17 0.538970 (0.276, 0.608]
18 -2.120528 (-2.122, -0.975]
19 1.066903 (0.812, 2.25]
您可以对 bin 进行排序以像这样可视化它们,这样您就可以看到 4 个正值 bin 和 4 个负值 bin:
np.sort(df['bin'].unique())
array([Interval(-2.1219999999999999, -0.97499999999999998, closed='right'),
Interval(-0.97499999999999998, -0.69499999999999995, closed='right'),
Interval(-0.69499999999999995, -0.35099999999999998, closed='right'),
Interval(-0.35099999999999998, -0.0050699999999999999, closed='right'),
Interval(0.125, 0.27600000000000002, closed='right'),
Interval(0.27600000000000002, 0.60799999999999998, closed='right'),
Interval(0.60799999999999998, 0.81200000000000006, closed='right'),
Interval(0.81200000000000006, 2.25, closed='right')], dtype=object)