python 使用多个条件创建一个新列
python create a new column using multiple conditions
我刚从 Python 开始,我有一大堆受试者及其 (BMI) 体重指数(以及更多数据)。
我需要创建一个新列(称为 OMS),我可以在其中声明它们是否为 "normal"、"overweight"、"obese"、等等。
但我就是找不到正确的方法。我尝试了 np.when,但这只适用于 2 个条件。
我尝试了 if、elif 和 else 都没有成功,还有:
df['oms'] = np.nan
df['oms'].loc[(df['IMC'] <=18.5 )] = "slim"
df['oms'].loc[(df['IMC'] >= 18.5) & (df['IMC'] <25 )] = "normal"
df['oms'].loc[(df['IMC'] >= 25) & (df['IMC'] <=30 )] = "overweight"
df['oms'].loc[(df['IMC'] > 30)] = "obese"
有什么想法吗?我卡住了。
df.loc[df['IMC'].lt(18.5), 'oms'] = "slim"
df.loc[df['IMC'].ge(18.5) & df['IMC'].lt(25), 'oms'] = "normal"
df.loc[df['IMC'].ge(25) & df['IMC'].lt(30), 'oms'] = "overweight"
df.loc[df['IMC'].ge(30), 'oms'] = "obese"
您也可以使用pd.cut
。
bins = [0, 18.5, 25, 30, 9999]
labels = ['slim', 'normal', 'overweight', 'obese']
df = pd.DataFrame({'IMC': [15, 20, 27, 40]})
df['oms'] = pd.cut(df['IMC'], bins, labels=labels)
>>> df
IMC oms
0 15 slim
1 20 normal
2 27 overweight
3 40 obese
也许试试:
df['oms'] = ""#keep it object dtype
df.loc[(df['IMC'] <=18.5 ), 'oms'] = "slim"
df.loc[(df['IMC'] >= 18.5) & (df['IMC'] <25 ), 'oms'] = "normal"
df.loc[(df['IMC'] >= 25) & (df['IMC'] <=30 ), 'oms'] = "overweight"
df.loc[(df['IMC'] > 30), 'oms'] = "obese"
您可以将 lambda 函数和 apply 与熊猫数据框一起使用。
我创建了一个虚拟数据文件:
bmi,height
20,72
22,73
26,77
5,66
13,60
导入数据文件
df = pd.read_csv('data.txt', header=0)
创建了一个列,就像您对 NaN 所做的那样(但您不必这样做)
df["oms"] = np.nan
然后使用 lambda 将 'bmi' 列与某些条件进行比较
df['oms'] = df['bmi'].apply(lambda x: 'slim' if x < 18.5 else ('normal' if x<25 else ('overweight' if x<30 else 'obese')))
数据是这样的,
print(df.head())
bmi height oms
0 20 72 normal
1 22 73 obese
2 26 77 obese
3 5 66 skinny
4 13 60 skinny
使用 numpy.select
,我喜欢这个替代方案,因为它用途广泛,您可以轻松添加或删除条件。
import numpy as np
condlist = [df["IMC"] <= 18,
(df["IMC"] >= 18.5) & (df['IMC'] <25),
(df["IMC"] >= 25) & (df['IMC'] <=30),
df["IMC"] > 30]
condchoice = ["slim", "normal", "overweight", "obese"]
df["oms"] = np.select(condlist, condchoice)
我刚从 Python 开始,我有一大堆受试者及其 (BMI) 体重指数(以及更多数据)。 我需要创建一个新列(称为 OMS),我可以在其中声明它们是否为 "normal"、"overweight"、"obese"、等等。
但我就是找不到正确的方法。我尝试了 np.when,但这只适用于 2 个条件。
我尝试了 if、elif 和 else 都没有成功,还有:
df['oms'] = np.nan
df['oms'].loc[(df['IMC'] <=18.5 )] = "slim"
df['oms'].loc[(df['IMC'] >= 18.5) & (df['IMC'] <25 )] = "normal"
df['oms'].loc[(df['IMC'] >= 25) & (df['IMC'] <=30 )] = "overweight"
df['oms'].loc[(df['IMC'] > 30)] = "obese"
有什么想法吗?我卡住了。
df.loc[df['IMC'].lt(18.5), 'oms'] = "slim"
df.loc[df['IMC'].ge(18.5) & df['IMC'].lt(25), 'oms'] = "normal"
df.loc[df['IMC'].ge(25) & df['IMC'].lt(30), 'oms'] = "overweight"
df.loc[df['IMC'].ge(30), 'oms'] = "obese"
您也可以使用pd.cut
。
bins = [0, 18.5, 25, 30, 9999]
labels = ['slim', 'normal', 'overweight', 'obese']
df = pd.DataFrame({'IMC': [15, 20, 27, 40]})
df['oms'] = pd.cut(df['IMC'], bins, labels=labels)
>>> df
IMC oms
0 15 slim
1 20 normal
2 27 overweight
3 40 obese
也许试试:
df['oms'] = ""#keep it object dtype
df.loc[(df['IMC'] <=18.5 ), 'oms'] = "slim"
df.loc[(df['IMC'] >= 18.5) & (df['IMC'] <25 ), 'oms'] = "normal"
df.loc[(df['IMC'] >= 25) & (df['IMC'] <=30 ), 'oms'] = "overweight"
df.loc[(df['IMC'] > 30), 'oms'] = "obese"
您可以将 lambda 函数和 apply 与熊猫数据框一起使用。
我创建了一个虚拟数据文件:
bmi,height
20,72
22,73
26,77
5,66
13,60
导入数据文件
df = pd.read_csv('data.txt', header=0)
创建了一个列,就像您对 NaN 所做的那样(但您不必这样做)
df["oms"] = np.nan
然后使用 lambda 将 'bmi' 列与某些条件进行比较
df['oms'] = df['bmi'].apply(lambda x: 'slim' if x < 18.5 else ('normal' if x<25 else ('overweight' if x<30 else 'obese')))
数据是这样的,
print(df.head())
bmi height oms
0 20 72 normal
1 22 73 obese
2 26 77 obese
3 5 66 skinny
4 13 60 skinny
使用 numpy.select
,我喜欢这个替代方案,因为它用途广泛,您可以轻松添加或删除条件。
import numpy as np
condlist = [df["IMC"] <= 18,
(df["IMC"] >= 18.5) & (df['IMC'] <25),
(df["IMC"] >= 25) & (df['IMC'] <=30),
df["IMC"] > 30]
condchoice = ["slim", "normal", "overweight", "obese"]
df["oms"] = np.select(condlist, condchoice)