根据多个条件添加列

Add column based on multiple conditions

我有一个愚蠢的问题。我的 df 看起来像这样:

       FID_2     STA_SID           s2            s1  Qh_STA  Qh_FID2  \
14 222143.00 26040713.00           0.00        0.00    8.00    17.00   
15 222143.00 26040713.00           0.00        8.00    6.00    17.00   
13 222143.00 26040713.00           6.00        8.00    3.00    17.00   
17       NaN 26033594.00 29445425.00        1707.00    5.00      nan   

我定义了以下函数和命令:

A = 0.8

def seekDO(row):
       if (row['Qh_STA'])/row['Qh_FID2'] < A :
          return 1
       if ((row['Qh_STA'] + row['s1'])/row['Qh_FID2'] < A) :
          return 1
       if ((row['Qh_STA'] + row['s1'] + row['s2']) / row['Qh_FID2'] < A) :
          return 1
       return 0

df['DO'] = df.apply (lambda row: seekDO(row),axis=1)

问题是 DO 我得到

    DO   
14  1  
15  1  
13  1  
17  0 

而不是

    DO   
14  1  
15  0  
13  0  
17  0 

你能看出我哪里弄错了吗?

也许np.where;

condition = ((df['Qh_STA'])/df['Qh_FID2'] < A) | (((df['Qh_STA'] + (df['s1'])/df['Qh_FID2']) < A)) | (((df['Qh_STA'] + df['s1'] + (df['s2']) / df['Qh_FID2']) < A))

df['DO'] = np.where(condition, 1, 0)

但是你应该得到

    DO   
    14  1  
    15  1  
    13  1  
    17  0

确实。

再看看你的价值观。

    8 / 17 IS < 0.8
    6 / 17 IS < 0.8
    3 / 17 IS < 0.8

输出是正确的,你期望得到的输出是错误的。

我相信你可以用所有列而不是循环来测试每个条件,什么是慢:

A = 0.8

m1 = df['Qh_STA']/df['Qh_FID2'] < A 
m2 = (df['Qh_STA'] + df['s1'])/df['Qh_FID2'] < A
m3 = (df['Qh_STA'] + df['s1'] + df['s2']) / df['Qh_FID2'] < A

如果所有条件都满足 True:

,则需要 AND by & 的链列进行匹配
df['DO'] = (m1 & m2 & m3).astype(int)
print (df)
       FID_2     STA_SID          s2      s1  Qh_STA  Qh_FID2  DO
14  222143.0  26040713.0         0.0     0.0     8.0     17.0   1
15  222143.0  26040713.0         0.0     8.0     6.0     17.0   0
13  222143.0  26040713.0         6.0     8.0     3.0     17.0   0
17       NaN  26033594.0  29445425.0  1707.0     5.0      NaN   0