如何使用 pandas 计算 excel 中特定文本的出现次数

Question

第一次来这里，刚开始学习编码，我正在进行一项关于疾病的一些风险因素的临床研究，在这里我已经得到了 excel 的患者数据。该代码的目的是统计每个患者（每一行）的危险因素（肥胖、高血压、糖尿病、高血脂）的数量，并将结果打印在新的列中，最后一步，统计有多少患者总共有4个风险因素，有多少人有3个、2个和只有一个，或者none.

日期框架是这样的（只是一个例子，不违反保密规定）： part of the dataframe

好吧，试试 python 中的这部分，刚编出来的，我试了下面的代码：

import pandas as pd
df1=pd.DataFrame({'gender':['male','male','female','female','male'],'age':[49,60,65,20,65],
                  'obesity':['yes','yes','NaN','NaN','yes'],
                  'hypertension':['yes','yes','yes','NaN','yes'],
                  'diabetes':['NaN','yes','NaN','NaN','yes'],
                  'hyperlipidemia':['yes','yes','yes','NaN','NaN']})
factor_count=[] #to be written in the very right column
row=0
column=3
while row<=5:             #5 rows in total for this example
    count=0               #to count the risk factors of each row
    while column<=5:
        if df.iloc[row,column] == 'yes':         #probably my while loop is really stupid
            count+=1
            column+=1
    factor_count.append(count)
    row+=1
print(factor_count)

好吧，在我点击运行之后，内核就再也没有停止过，我只是在自学编程，所以我不知道发生了什么，所以我不得不终止内核。有人可以帮我解决这个问题吗？

Answer 1

你可以用1替换dataframe中的'yes'然后使用方法sum:

df1.replace('yes',1,inplace=True)
df1.iloc[:,[2,3,4,5]] = df1.iloc[:,[2,3,4,5]].astype(float)
df1["Numbers of factor"] = df1.iloc[:,[2,3,4,5]].sum(axis=1)

然后该列的直方图应该给出有多少患者有 1,2 3 ... 风险

df1["Numbers of factor"].hist()

如何使用 pandas 计算 excel 中特定文本的出现次数

how to use pandas to count occurrence of specific text in excel

python

excel

medical

pandas