如何使用多个条件，包括选择 Python 中的分位数

Question

想象以下数据集df：

Row	Population_density	Distance
1	400	50
2	500	30
3	300	40
4	200	120
5	500	60
6	1000	50
7	3300	30
8	500	90
9	700	100
10	1000	110
11	900	200
12	850	30

当 df['Population_density'] 的值高于第三个分位数 (>75%) 并且 df['Distance'] 小于 100，而 0 是给剩下的数据？因此，第 6 行和第 7 行应为 1，而其他行应为 0。

创建一个只有一个标准的虚拟变量可能相当容易。例如，以下条件适用于创建一个新的虚拟变量，当距离 <100 时包含 1，否则包含 0：df['Distance_Below_100'] = np.where(df['Distance'] < 100, 1, 0)。但是，我不知道如何结合其中一个条件包括分位数选择的条件（在这种情况下，变量 Population_density.

的上 25%

import pandas as pd  
  
# assign data of lists.  
data = {'Row': range(1,13,1), 'Population_density': [400, 500, 300, 200, 500, 1000, 3300, 500, 700, 1000, 900, 850],
        'Distance': [50, 30, 40, 120, 60, 50, 30, 90, 100, 110, 200, 30]}  
  
# Create DataFrame  
df = pd.DataFrame(data)

Answer 1

他，我推荐使用 lambda 在数据框上创建函数。

例如这是你的函数：

def myFunction(value):
 pass

创建一个新列'new_column'，(pick_cell)是你想在哪个单元格上创建一个函数：

df['new_column']= df.apply(lambda x : myFunction(x.pick_cell))

Answer 2

可以使用&或|加入条件

import numpy as np

df['Distance_Below_100'] = np.where(df['Population_density'].gt(df['Population_density'].quantile(0.75)) & df['Distance'].lt(100), 1, 0)

print(df)

    Row  Population_density  Distance  Distance_Below_100
0     1                 400        50                   0
1     2                 500        30                   0
2     3                 300        40                   0
3     4                 200       120                   0
4     5                 500        60                   0
5     6                1000        50                   1
6     7                3300        30                   1
7     8                 500        90                   0
8     9                 700       100                   0
9    10                1000       110                   0
10   11                 900       200                   0
11   12                 850        30                   0

如何使用多个条件，包括选择 Python 中的分位数

How to use multiple conditions, including selecting on quantile in Python

python

select

numpy

pandas