创建一列以对 python 中的数值进行分类
Create a column to categorize numerical values in python
我在 python 中有一个名为 df 的数据框,其中包含客户的 BMI 作为名为 'bmi' 的列中的数字。我想在数据框中添加一个新列,称为 'bmi_cat',它是基于其数值的 BMI 类别(即:<18.5 是体重不足,18.5 到 24.9 是健康的,等等)。
这是我试过的方法,但没有用。它不喜欢使用 'for.'
df['bmi_cat'] = for i in df['bmi'] :
if i < 18.5 :
df['bmi_cat'] == 'underweight'
elif i >= 18.5 and i < 25 :
df['bmi_cat'] == 'healthy'
elif i >= 25 and i < 30 :
df['bmi_cat'] == 'overweight'
else :
df['bmi_cat'] == 'obese'
我只是在学习python...如果您能提供任何帮助,我们将不胜感激!
您有一个条件列表和 select 对应的值,因此您可以使用 np.select
:
import numpy as np
bmi = df["bmi"]
cond_list = [bmi < 18.5, bmi < 25, bmi < 30, bmi >= 30]
choice_list = ["underweight", "healthy", "overweight", "obese"]
df["bmi_cat"] = np.select(cond_list, choice_list)
它从左到右检查 cond_list
中的条件,只要找到匹配项,就会在 choice_list
中查找并分配该值。
您可以使用pd.cut
import numpy as np
bins = [-np.inf,18.5, 25, 30, np.inf]
labels = ["underweight","healthy","overweight","obese"]
df['bmi_cat'] = pd.cut(df['bmi'], bins=bins, labels=labels)
customers = [
{"name": "Ken", "bmi": 24},
{"name": "ben", "bmi": 18.5},
{"name": "sarah", "bmi": 18.4},
{"name": "dave", "bmi": 12},
{"name": "kenneth", "bmi": 18},
{"name": "dylan", "bmi": 25},
{"name": "scott", "bmi": 30},
]
for customer in customers:
if customer["bmi"] < 18.5 :
customer['bmi_cat'] = 'underweight'
elif customer["bmi"] < 25 :
customer['bmi_cat'] = 'healthy'
elif customer["bmi"] < 30 :
customer['bmi_cat'] = 'overweight'
else :
customer['bmi_cat'] = 'obese'
print("Customer {name} has BMI {bmi} category {bmi_cat}".format(
name=customer["name"],
bmi=customer["bmi"],
bmi_cat=customer["bmi_cat"])
)
我在 python 中有一个名为 df 的数据框,其中包含客户的 BMI 作为名为 'bmi' 的列中的数字。我想在数据框中添加一个新列,称为 'bmi_cat',它是基于其数值的 BMI 类别(即:<18.5 是体重不足,18.5 到 24.9 是健康的,等等)。
这是我试过的方法,但没有用。它不喜欢使用 'for.'
df['bmi_cat'] = for i in df['bmi'] :
if i < 18.5 :
df['bmi_cat'] == 'underweight'
elif i >= 18.5 and i < 25 :
df['bmi_cat'] == 'healthy'
elif i >= 25 and i < 30 :
df['bmi_cat'] == 'overweight'
else :
df['bmi_cat'] == 'obese'
我只是在学习python...如果您能提供任何帮助,我们将不胜感激!
您有一个条件列表和 select 对应的值,因此您可以使用 np.select
:
import numpy as np
bmi = df["bmi"]
cond_list = [bmi < 18.5, bmi < 25, bmi < 30, bmi >= 30]
choice_list = ["underweight", "healthy", "overweight", "obese"]
df["bmi_cat"] = np.select(cond_list, choice_list)
它从左到右检查 cond_list
中的条件,只要找到匹配项,就会在 choice_list
中查找并分配该值。
您可以使用pd.cut
import numpy as np
bins = [-np.inf,18.5, 25, 30, np.inf]
labels = ["underweight","healthy","overweight","obese"]
df['bmi_cat'] = pd.cut(df['bmi'], bins=bins, labels=labels)
customers = [
{"name": "Ken", "bmi": 24},
{"name": "ben", "bmi": 18.5},
{"name": "sarah", "bmi": 18.4},
{"name": "dave", "bmi": 12},
{"name": "kenneth", "bmi": 18},
{"name": "dylan", "bmi": 25},
{"name": "scott", "bmi": 30},
]
for customer in customers:
if customer["bmi"] < 18.5 :
customer['bmi_cat'] = 'underweight'
elif customer["bmi"] < 25 :
customer['bmi_cat'] = 'healthy'
elif customer["bmi"] < 30 :
customer['bmi_cat'] = 'overweight'
else :
customer['bmi_cat'] = 'obese'
print("Customer {name} has BMI {bmi} category {bmi_cat}".format(
name=customer["name"],
bmi=customer["bmi"],
bmi_cat=customer["bmi_cat"])
)