如何在 pandas 数据框的列中查找特定值的计数并将其用于计算

how to find count of a specific value in column on pandas data frame and use it for calculations

我有一个类似于下面提到的 pandas 数据框,对于所有(域)唯一值,我想计算 Count(EV)+Count(PV)+count(DV)+count(GV ) 其中值 = 绿色 / 该唯一域中值的总数

Domain EV PV DV GV Numerator(part) denominator(part) ideal Output
KA-BLR Green Blue Green 1 6 0.166
KA-BLR Green Green Blue 1 6 0.166
KL-TRV Green Blue Yellow Red 0.5 7 0.071
KL-TRV Green Blue Blue 0.5 7 0.071
KL-COK Blue Blue Yellow Green 0.25 4 0.0625
TN-CHN Green Blue 0.5 5 0.1
TN-CHN Green Blue Yellow 0.5 5 0.1

示例代码

OVER_ALL_SCORE = {}
for Domain in df_RR["Domain"].unique():

   #count of greens 

    EV_G = (df_RR['EV'] == 'Green').sum()
    
    PV_G = (df_RR['PV'] == 'Green').sum()
    
    DV_G = (df_RR['DV'] == 'Green').sum()
    
    GV_G= (df_RR['GV'] == 'Green').sum()

    #count of all values excluding null

    EV = df_RR['EV'].sum()
    
    PV = df_RR['PV'].sum()
    
    DV = df_RR['DV'].sum()
    
    GV = df_RR['GV'] .sum()
    
    
       
    # so (0.25*(SUM for "DV" of greens (totally correct))+0.25*(SUM for "PV" of greens (totally correct))+0.25*(SUM for "EV" of greens (totally correct))+0.25*(SUM for "GV" of greens (totally correct)) / total count of values
    
   Numerator = (0.25*EV_G) + (0.25*PV_G) + (0.25* DV_G) + (0.25* GV_G)
   

   denominator = EV+PV+DV+GV

   try:
      OVER_ALL_SCORE [domain]=(Numerator /denominator )
   
  except:
        OVER_ALL_SCORE [domain]=0

 df_RR['Overall_score']=df_RR['Domain'].map(OVER_ALL_SCORE)

    

目前此逻辑在所有域中返回相同的值。请帮忙解决

提前致谢

这是一个提供理想输出的解决方案:

OVER_ALL_SCORE = {}

for Domain in df_RR["Domain"].unique():

    sub_df = df_RR.loc[df_RR['Domain']==Domain]

    #count of greens 

    EV_G = (sub_df['EV'] == 'Green').sum()
    PV_G = (sub_df['PV'] == 'Green').sum()
    DV_G = (sub_df['DV'] == 'Green').sum()
    GV_G = (sub_df['GV'] == 'Green').sum()

    #count of all values

    EV = sub_df['EV'].count()
    PV = sub_df['PV'].count()
    DV = sub_df['DV'].count()
    GV = sub_df['GV'].count()

    numerator = (0.25*EV_G) + (0.25*PV_G) + (0.25* DV_G) + (0.25* GV_G)
    denominator = EV+PV+DV+GV

    try:
        OVER_ALL_SCORE[Domain] = (numerator /denominator )
    except:
        OVER_ALL_SCORE[Domain] = 0

df_RR['Overall_score']=df_RR['Domain'].map(OVER_ALL_SCORE)

有一些变化是关键:

count() 与 sum()

在计算所有值时,您需要使用 count 方法而不是 sum 方法(否则,此代码只会连接 [=45= 中的字符串值]):

df_RR['EV'].sum()

returns:'GreenGreenGreenBlueGreen'(因为求和方法只是将系列中的所有值相加)。

改用这个:

df_RR['EV'].count()

它在您的果岭数中起作用的原因是此代码 df_RR['EV'] == 'Green' 正在返回一系列布尔值,这些布尔值可以正确求和以获得果岭数(因为它会将真值添加为1 和假为零):

True True True False True 等同于 1 1 1 0 1

主要问题

目前,您的计数在每个循环中都与您 运行 相同,因为您没有根据域进行过滤。我会创建子数据框。基于您作为循环第一步查看的域:

domain_df = df_RR.loc[df_RR['Domain'] == Domain]