如何构造一个 python 函数，该函数从数据框中获取输入以计算特定指标

Question

我是初学者 python 编码员，我想构建一个 python 计算特定指标的函数，

例如，数据如下所示：

ID    status        Age    Gender
01    healthy       16     Male
02    un_healthy    14     Female
03    un_healthy    22     Male
04    healthy       12     Female
05    healthy       33     Female

构建一个函数，通过 healthy+un_health

计算健康人的百分比

def health_rate(healthy, un_healthy,age){
    if (age >= 15):
        if (gender == "Male"):
            return rateMale= (count(healthy)/count(healthy)+count(un_healthy))
        Else
            return rateFemale= (count(healthy)/count(healthy)+count(un_healthy))
    Else 
        return print("underage");

然后只需使用 .apply

但逻辑不对，我仍然没有得到我想要的输出我要return男率女率

Answer 1

df[col_name].value_counts(normalize=True) 为您提供所需列的比例。以下是如何对其进行参数化：

def health_percentages(df, col_name):
    return df[col_name].value_counts(normalize=True)*100

示例：

data = [ [1, 'healthy',16,'M'], [2, 'un_healthy',14,'F'], [3, 'un_healthy', 22, 'M'],[4, 'healthy', 12, 'F'],[5, 'healthy', 33, 'F']]

df = pd.DataFrame(data, columns = ['ID','status', 'Age', 'Gender'])
print(df)
print(health_percentages(df, 'status'))

#output:
   ID      status  Age Gender
0   1     healthy   16      M
1   2  un_healthy   14      F
2   3  un_healthy   22      M
3   4     healthy   12      F
4   5     healthy   33      F

healthy       60.0
un_healthy    40.0

Answer 2

您可以使用 pivot_table（df 您的数据框）：

df = df[df.Age >= 15].pivot_table(
    index="status", columns="Gender", values="ID",
    aggfunc="count", margins=True, fill_value=0
)

示例数据框的结果：

Gender      Female  Male  All
status                       
healthy          1     1    2
un_healthy       0     1    1
All              1     2    3

如果你想要百分比：

df = (df / df.loc["All", :] * 100).drop("All")

结果：

Gender      Female  Male        All
status                             
healthy      100.0  50.0  66.666667
un_healthy     0.0  50.0  33.333333

如何构造一个 python 函数，该函数从数据框中获取输入以计算特定指标

how to structure a python function that take input from data frame to calculate specific indicator

python

indicator

dataframe