根据条件创建 bin

Question

我的原始数据集类似于下面的示例：

| id | old_a | new_a | old_b | new_b | ratio_a  | ratio_b |
|----|-------|-------|-------|-------|----------|---------|
| 1  | 350   | 6     | 35    | 0     | 58.33333 | Inf     |
| 2  | 164   | 79    | 6     | 2     | 2.075949 | 3       |
| 3  | 10    | 0     | 1     | 1     | Inf      | 1       |
| 4  | 120   | 1     | 10    | 0     | 120      | Inf     |

这是数据框：

df=[[1,350,6,35,0],[2,164,79,6,2],[3,10,0,1,1],[4,120,1,10,0]]
df= pd.DataFrame(df,columns=['id','old_a','new_a','old_b','new_b'])

我已经使用以下代码获得了 'ratio_a' 和 'ratio_b' 列（如 table 所示）：

df['ratio_a']= df['old_a']/df['new_a']
df['ratio_b']= df['old_b']/df['new_b']

接下来，我想再创建两列，其中包含 ratio_a 和 ratio_b 的值所在的数字范围。为此，我编写了以下代码：

bins = [0,10,20,30,40,50,60,70,80,90,100]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
df['a_range'] = pd.cut(df['ratio_a'], bins=bins, labels=labels, include_lowest=True)
df['b_range'] = pd.cut(df['ratio_b'], bins=bins, labels=labels, include_lowest=True)

我遇到的一个问题是，如果 ratio_a 和 ratio_b 中的任何值大于 100，它应该落在“>100”的桶中。我怎样才能做到这一点？我的最终结果应该如下所示：

| id | old_a | new_a | old_b | new_b | ratio_a  | ratio_b | a_range | b_range |
|----|-------|-------|-------|-------|----------|---------|---------|---------|
| 1  | 350   | 6     | 35    | 0     | 58.33333 | Inf     | 40-50   | NaN     |
| 2  | 164   | 79    | 6     | 2     | 2.075949 | 3       | 0-10    | 0-10    |
| 3  | 10    | 0     | 1     | 1     | Inf      | 1       | NaN     | 0-10    |
| 4  | 120   | 1     | 10    | 0     | 120      | Inf     | >100    | NaN     |

Answer 1

一种可能的解决方案：

bins = [0,10,20,30,40,50,60,70,80,90,100,np.inf]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
labels[-1]=">100"
df['a_range'] = pd.cut(df['ratio_a'], bins=bins, labels=labels, include_lowest=True)
df['b_range'] = pd.cut(df['ratio_b'], bins=bins, labels=labels, include_lowest=True)

结果：

id  old_a  new_a  old_b  new_b     ratio_a  ratio_b a_range b_range
 1    350      6     35      0   58.333333      inf   50-60     NaN
 2    164     79      6      2    2.075949      3.0    0-10    0-10
 3     10      0      1      1         inf      1.0     NaN    0-10
 4    120      1     10      0  120.000000      inf    >100     NaN

根据条件创建 bin

Creating bins based on condition

python

bins

pandas