根据多个条件添加列
Add column based on multiple conditions
我有一个愚蠢的问题。我的 df
看起来像这样:
FID_2 STA_SID s2 s1 Qh_STA Qh_FID2 \
14 222143.00 26040713.00 0.00 0.00 8.00 17.00
15 222143.00 26040713.00 0.00 8.00 6.00 17.00
13 222143.00 26040713.00 6.00 8.00 3.00 17.00
17 NaN 26033594.00 29445425.00 1707.00 5.00 nan
我定义了以下函数和命令:
A = 0.8
def seekDO(row):
if (row['Qh_STA'])/row['Qh_FID2'] < A :
return 1
if ((row['Qh_STA'] + row['s1'])/row['Qh_FID2'] < A) :
return 1
if ((row['Qh_STA'] + row['s1'] + row['s2']) / row['Qh_FID2'] < A) :
return 1
return 0
df['DO'] = df.apply (lambda row: seekDO(row),axis=1)
问题是 DO
我得到
DO
14 1
15 1
13 1
17 0
而不是
DO
14 1
15 0
13 0
17 0
你能看出我哪里弄错了吗?
也许np.where;
condition = ((df['Qh_STA'])/df['Qh_FID2'] < A) | (((df['Qh_STA'] + (df['s1'])/df['Qh_FID2']) < A)) | (((df['Qh_STA'] + df['s1'] + (df['s2']) / df['Qh_FID2']) < A))
df['DO'] = np.where(condition, 1, 0)
但是你应该得到
DO
14 1
15 1
13 1
17 0
确实。
再看看你的价值观。
8 / 17 IS < 0.8
6 / 17 IS < 0.8
3 / 17 IS < 0.8
输出是正确的,你期望得到的输出是错误的。
我相信你可以用所有列而不是循环来测试每个条件,什么是慢:
A = 0.8
m1 = df['Qh_STA']/df['Qh_FID2'] < A
m2 = (df['Qh_STA'] + df['s1'])/df['Qh_FID2'] < A
m3 = (df['Qh_STA'] + df['s1'] + df['s2']) / df['Qh_FID2'] < A
如果所有条件都满足 True
:
,则需要 AND
by &
的链列进行匹配
df['DO'] = (m1 & m2 & m3).astype(int)
print (df)
FID_2 STA_SID s2 s1 Qh_STA Qh_FID2 DO
14 222143.0 26040713.0 0.0 0.0 8.0 17.0 1
15 222143.0 26040713.0 0.0 8.0 6.0 17.0 0
13 222143.0 26040713.0 6.0 8.0 3.0 17.0 0
17 NaN 26033594.0 29445425.0 1707.0 5.0 NaN 0
我有一个愚蠢的问题。我的 df
看起来像这样:
FID_2 STA_SID s2 s1 Qh_STA Qh_FID2 \
14 222143.00 26040713.00 0.00 0.00 8.00 17.00
15 222143.00 26040713.00 0.00 8.00 6.00 17.00
13 222143.00 26040713.00 6.00 8.00 3.00 17.00
17 NaN 26033594.00 29445425.00 1707.00 5.00 nan
我定义了以下函数和命令:
A = 0.8
def seekDO(row):
if (row['Qh_STA'])/row['Qh_FID2'] < A :
return 1
if ((row['Qh_STA'] + row['s1'])/row['Qh_FID2'] < A) :
return 1
if ((row['Qh_STA'] + row['s1'] + row['s2']) / row['Qh_FID2'] < A) :
return 1
return 0
df['DO'] = df.apply (lambda row: seekDO(row),axis=1)
问题是 DO
我得到
DO
14 1
15 1
13 1
17 0
而不是
DO
14 1
15 0
13 0
17 0
你能看出我哪里弄错了吗?
也许np.where;
condition = ((df['Qh_STA'])/df['Qh_FID2'] < A) | (((df['Qh_STA'] + (df['s1'])/df['Qh_FID2']) < A)) | (((df['Qh_STA'] + df['s1'] + (df['s2']) / df['Qh_FID2']) < A))
df['DO'] = np.where(condition, 1, 0)
但是你应该得到
DO
14 1
15 1
13 1
17 0
确实。
再看看你的价值观。
8 / 17 IS < 0.8
6 / 17 IS < 0.8
3 / 17 IS < 0.8
输出是正确的,你期望得到的输出是错误的。
我相信你可以用所有列而不是循环来测试每个条件,什么是慢:
A = 0.8
m1 = df['Qh_STA']/df['Qh_FID2'] < A
m2 = (df['Qh_STA'] + df['s1'])/df['Qh_FID2'] < A
m3 = (df['Qh_STA'] + df['s1'] + df['s2']) / df['Qh_FID2'] < A
如果所有条件都满足 True
:
AND
by &
的链列进行匹配
df['DO'] = (m1 & m2 & m3).astype(int)
print (df)
FID_2 STA_SID s2 s1 Qh_STA Qh_FID2 DO
14 222143.0 26040713.0 0.0 0.0 8.0 17.0 1
15 222143.0 26040713.0 0.0 8.0 6.0 17.0 0
13 222143.0 26040713.0 6.0 8.0 3.0 17.0 0
17 NaN 26033594.0 29445425.0 1707.0 5.0 NaN 0