如何构造一个 python 函数,该函数从数据框中获取输入以计算特定指标
how to structure a python function that take input from data frame to calculate specific indicator
我是初学者 python 编码员,我想构建一个 python 计算特定指标的函数,
例如,数据如下所示:
ID status Age Gender
01 healthy 16 Male
02 un_healthy 14 Female
03 un_healthy 22 Male
04 healthy 12 Female
05 healthy 33 Female
构建一个函数,通过 healthy+un_health
计算健康人的百分比
def health_rate(healthy, un_healthy,age){
if (age >= 15):
if (gender == "Male"):
return rateMale= (count(healthy)/count(healthy)+count(un_healthy))
Else
return rateFemale= (count(healthy)/count(healthy)+count(un_healthy))
Else
return print("underage");
然后只需使用 .apply
但逻辑不对,我仍然没有得到我想要的输出
我要return男率女率
df[col_name].value_counts(normalize=True)
为您提供所需列的比例。以下是如何对其进行参数化:
def health_percentages(df, col_name):
return df[col_name].value_counts(normalize=True)*100
示例:
data = [ [1, 'healthy',16,'M'], [2, 'un_healthy',14,'F'], [3, 'un_healthy', 22, 'M'],[4, 'healthy', 12, 'F'],[5, 'healthy', 33, 'F']]
df = pd.DataFrame(data, columns = ['ID','status', 'Age', 'Gender'])
print(df)
print(health_percentages(df, 'status'))
#output:
ID status Age Gender
0 1 healthy 16 M
1 2 un_healthy 14 F
2 3 un_healthy 22 M
3 4 healthy 12 F
4 5 healthy 33 F
healthy 60.0
un_healthy 40.0
您可以使用 pivot_table(df
您的数据框):
df = df[df.Age >= 15].pivot_table(
index="status", columns="Gender", values="ID",
aggfunc="count", margins=True, fill_value=0
)
示例数据框的结果:
Gender Female Male All
status
healthy 1 1 2
un_healthy 0 1 1
All 1 2 3
如果你想要百分比:
df = (df / df.loc["All", :] * 100).drop("All")
结果:
Gender Female Male All
status
healthy 100.0 50.0 66.666667
un_healthy 0.0 50.0 33.333333
我是初学者 python 编码员,我想构建一个 python 计算特定指标的函数,
例如,数据如下所示:
ID status Age Gender
01 healthy 16 Male
02 un_healthy 14 Female
03 un_healthy 22 Male
04 healthy 12 Female
05 healthy 33 Female
构建一个函数,通过 healthy+un_health
计算健康人的百分比def health_rate(healthy, un_healthy,age){
if (age >= 15):
if (gender == "Male"):
return rateMale= (count(healthy)/count(healthy)+count(un_healthy))
Else
return rateFemale= (count(healthy)/count(healthy)+count(un_healthy))
Else
return print("underage");
然后只需使用 .apply
但逻辑不对,我仍然没有得到我想要的输出 我要return男率女率
df[col_name].value_counts(normalize=True)
为您提供所需列的比例。以下是如何对其进行参数化:
def health_percentages(df, col_name):
return df[col_name].value_counts(normalize=True)*100
示例:
data = [ [1, 'healthy',16,'M'], [2, 'un_healthy',14,'F'], [3, 'un_healthy', 22, 'M'],[4, 'healthy', 12, 'F'],[5, 'healthy', 33, 'F']]
df = pd.DataFrame(data, columns = ['ID','status', 'Age', 'Gender'])
print(df)
print(health_percentages(df, 'status'))
#output:
ID status Age Gender
0 1 healthy 16 M
1 2 un_healthy 14 F
2 3 un_healthy 22 M
3 4 healthy 12 F
4 5 healthy 33 F
healthy 60.0
un_healthy 40.0
您可以使用 pivot_table(df
您的数据框):
df = df[df.Age >= 15].pivot_table(
index="status", columns="Gender", values="ID",
aggfunc="count", margins=True, fill_value=0
)
示例数据框的结果:
Gender Female Male All
status
healthy 1 1 2
un_healthy 0 1 1
All 1 2 3
如果你想要百分比:
df = (df / df.loc["All", :] * 100).drop("All")
结果:
Gender Female Male All
status
healthy 100.0 50.0 66.666667
un_healthy 0.0 50.0 33.333333