如何在 pandas 数据框的列中查找特定值的计数并将其用于计算

Question

我有一个类似于下面提到的 pandas 数据框，对于所有（域）唯一值，我想计算 Count(EV)+Count(PV)+count(DV)+count(GV ) 其中值 = 绿色 / 该唯一域中值的总数

Domain	EV	PV	DV	GV	Numerator(part)	denominator(part)	ideal Output
KA-BLR	Green	Blue		Green	1	6	0.166
KA-BLR	Green	Green	Blue		1	6	0.166
KL-TRV	Green	Blue	Yellow	Red	0.5	7	0.071
KL-TRV	Green	Blue	Blue		0.5	7	0.071
KL-COK	Blue	Blue	Yellow	Green	0.25	4	0.0625
TN-CHN	Green		Blue		0.5	5	0.1
TN-CHN	Green	Blue		Yellow	0.5	5	0.1

示例代码

OVER_ALL_SCORE = {}
for Domain in df_RR["Domain"].unique():

   #count of greens 

    EV_G = (df_RR['EV'] == 'Green').sum()
    
    PV_G = (df_RR['PV'] == 'Green').sum()
    
    DV_G = (df_RR['DV'] == 'Green').sum()
    
    GV_G= (df_RR['GV'] == 'Green').sum()

    #count of all values excluding null

    EV = df_RR['EV'].sum()
    
    PV = df_RR['PV'].sum()
    
    DV = df_RR['DV'].sum()
    
    GV = df_RR['GV'] .sum()
    
    
       
    # so (0.25*(SUM for "DV" of greens (totally correct))+0.25*(SUM for "PV" of greens (totally correct))+0.25*(SUM for "EV" of greens (totally correct))+0.25*(SUM for "GV" of greens (totally correct)) / total count of values
    
   Numerator = (0.25*EV_G) + (0.25*PV_G) + (0.25* DV_G) + (0.25* GV_G)
   

   denominator = EV+PV+DV+GV

   try:
      OVER_ALL_SCORE [domain]=(Numerator /denominator )
   
  except:
        OVER_ALL_SCORE [domain]=0

 df_RR['Overall_score']=df_RR['Domain'].map(OVER_ALL_SCORE)

目前此逻辑在所有域中返回相同的值。请帮忙解决

提前致谢

Answer 1

这是一个提供理想输出的解决方案：

OVER_ALL_SCORE = {}

for Domain in df_RR["Domain"].unique():

    sub_df = df_RR.loc[df_RR['Domain']==Domain]

    #count of greens 

    EV_G = (sub_df['EV'] == 'Green').sum()
    PV_G = (sub_df['PV'] == 'Green').sum()
    DV_G = (sub_df['DV'] == 'Green').sum()
    GV_G = (sub_df['GV'] == 'Green').sum()

    #count of all values

    EV = sub_df['EV'].count()
    PV = sub_df['PV'].count()
    DV = sub_df['DV'].count()
    GV = sub_df['GV'].count()

    numerator = (0.25*EV_G) + (0.25*PV_G) + (0.25* DV_G) + (0.25* GV_G)
    denominator = EV+PV+DV+GV

    try:
        OVER_ALL_SCORE[Domain] = (numerator /denominator )
    except:
        OVER_ALL_SCORE[Domain] = 0

df_RR['Overall_score']=df_RR['Domain'].map(OVER_ALL_SCORE)

有一些变化是关键：

count() 与 sum()

在计算所有值时，您需要使用 count 方法而不是 sum 方法（否则，此代码只会连接 [=45= 中的字符串值]):

df_RR['EV'].sum()

returns：'GreenGreenGreenBlueGreen'（因为求和方法只是将系列中的所有值相加）。

改用这个：

df_RR['EV'].count()

它在您的果岭数中起作用的原因是此代码 df_RR['EV'] == 'Green' 正在返回一系列布尔值，这些布尔值可以正确求和以获得果岭数（因为它会将真值添加为1 和假为零):

True True True False True 等同于 1 1 1 0 1

主要问题

目前，您的计数在每个循环中都与您运行相同，因为您没有根据域进行过滤。我会创建子数据框。基于您作为循环第一步查看的域：

domain_df = df_RR.loc[df_RR['Domain'] == Domain]

如何在 pandas 数据框的列中查找特定值的计数并将其用于计算

how to find count of a specific value in column on pandas data frame and use it for calculations

calculated-columns

dataframe

python-3.x

pandas

pandas-groupby