将列的值与变量进行比较并创建新列
comparing values of a column with a variable and make new column
我有一个这样的数据框:
Patch Last reward First reward Difference Name Block_No.
group_id
1 3 0.0 0.0 0.0 XYZ 1
2 4 43.0 54.0 11.0 XYZ 1
3 5 0.0 0.0 0.0 XYZ 2
4 6 40.0 65.0 25.0 XYZ 2
5 7 0.0 0.0 0.0 XYZ 3
6 0 0.0 0.0 0.0 XYZ 3
我想根据以下条件创建一个名为 'Rep_rate' 的新列:
如果 block_no。 = 1 并且如果 patch = 3 ,则 Rep_rate = 4 ,否则 Rep_rate = 0.
我试过这样做:
if (df_last['Block_No.']) == 1:
for i in range (len(df_last)):
if df_last['Patch'][i] == 1:
rep = 8
else:
rep = 0
df_last['Rep_Rate'] = rep
if (df_last['Block_No.']) == 2:
for i in range (len(df_last)):
if df_last['Patch'][i] == 1:
rep = 4
else:
rep = 0
df_last['Rep_Rate'] = rep
if (df_last['Block_No.']) == 3:
for i in range (len(df_last)):
if df_last['Patch'][i] == 1:
rep = 8
else:
rep = 0
df_last['Rep_Rate'] = rep
然而,当我尝试这样做时,出现以下错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我不明白您用来填充 RR 列的逻辑,但一种方法可能是:
df['RR'] = 0
if (df['Block'] == 1).all():
for i, row in df.iterrows():
if row['Patch'] == 3:
df.loc[i,'RR'] = 4 # note using df.loc to directly edit the dataframe
elif (df['Block'] == x).all():
for i, row in df.iterrows():
if row['Patch'] == y:
df.loc[i,'RR'] = z
# more if statements as needed
将 x
、y
和 z
替换为您需要的任何值。
您遇到的问题是 if df["Block"] == 1
。代码 df["Block"] == 1
会生成一个 True/False 的布尔系列,以确定系列中的每个值是否等于 1。不支持对系列使用 if 语句,因为含义不明确。 Pandas 提供 ser.any()
和 ser.all()
用于明确的布尔值评估。 EG
if (df["Block"] ==1).all():
# more code ...
如果 Block 列中的所有值都等于 1, 将计算 if
语句中的代码。
你可能需要修改一下逻辑(我也不是很懂)。但是,在行上循环并不是很好。所以...
尝试:
import pandas as pd
# create a test dataframe
df = pd.DataFrame({'Block_No.': [1,1,2,2,3,3], 'Patch': [3,4,5,6,7,0], })
# create a function to return 8 or 0 depending on the values of 'Block' and 'Patch'
def fillRep_rate(block, patch):
if (block == 1) & (patch == 3):
return 4
# more options to return other values if required...
# elif (block == 2) & (patch == 4):
# return 5
# elif (block == 3) & (patch == 5):
# return 6
else:
return 0
# create a new column called 'RR' and populate with the value returned from the function above
df['Rep_rate'] = df.apply(lambda x: fillRep_rate(x['Block_No.'], x['Patch']), axis=1)
print(df)
输出:
Block_No. Patch Rep_rate
0 1 3 8
1 1 4 0
2 2 5 0
3 2 6 0
4 3 7 0
5 3 0 0
您可以为此使用 df.query()
。
例如:
假设这是您的数据框
Patch Block_No.
3 1
4 1
5 2
6 2
7 3
0 3
# add a new column and set values to 0
df['Rep_rate'] = 0
# run query and store index on a list
idx = df.query('Patch == 3 & Block_No. == 1').index.tolist()
# update values in Rep_rate column using this list
df.loc[idx, 'Rep_rate '] = 4
我有一个这样的数据框:
Patch Last reward First reward Difference Name Block_No.
group_id
1 3 0.0 0.0 0.0 XYZ 1
2 4 43.0 54.0 11.0 XYZ 1
3 5 0.0 0.0 0.0 XYZ 2
4 6 40.0 65.0 25.0 XYZ 2
5 7 0.0 0.0 0.0 XYZ 3
6 0 0.0 0.0 0.0 XYZ 3
我想根据以下条件创建一个名为 'Rep_rate' 的新列: 如果 block_no。 = 1 并且如果 patch = 3 ,则 Rep_rate = 4 ,否则 Rep_rate = 0.
我试过这样做:
if (df_last['Block_No.']) == 1:
for i in range (len(df_last)):
if df_last['Patch'][i] == 1:
rep = 8
else:
rep = 0
df_last['Rep_Rate'] = rep
if (df_last['Block_No.']) == 2:
for i in range (len(df_last)):
if df_last['Patch'][i] == 1:
rep = 4
else:
rep = 0
df_last['Rep_Rate'] = rep
if (df_last['Block_No.']) == 3:
for i in range (len(df_last)):
if df_last['Patch'][i] == 1:
rep = 8
else:
rep = 0
df_last['Rep_Rate'] = rep
然而,当我尝试这样做时,出现以下错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我不明白您用来填充 RR 列的逻辑,但一种方法可能是:
df['RR'] = 0
if (df['Block'] == 1).all():
for i, row in df.iterrows():
if row['Patch'] == 3:
df.loc[i,'RR'] = 4 # note using df.loc to directly edit the dataframe
elif (df['Block'] == x).all():
for i, row in df.iterrows():
if row['Patch'] == y:
df.loc[i,'RR'] = z
# more if statements as needed
将 x
、y
和 z
替换为您需要的任何值。
您遇到的问题是 if df["Block"] == 1
。代码 df["Block"] == 1
会生成一个 True/False 的布尔系列,以确定系列中的每个值是否等于 1。不支持对系列使用 if 语句,因为含义不明确。 Pandas 提供 ser.any()
和 ser.all()
用于明确的布尔值评估。 EG
if (df["Block"] ==1).all():
# more code ...
如果 Block 列中的所有值都等于 1, 将计算 if
语句中的代码。
你可能需要修改一下逻辑(我也不是很懂)。但是,在行上循环并不是很好。所以...
尝试:
import pandas as pd
# create a test dataframe
df = pd.DataFrame({'Block_No.': [1,1,2,2,3,3], 'Patch': [3,4,5,6,7,0], })
# create a function to return 8 or 0 depending on the values of 'Block' and 'Patch'
def fillRep_rate(block, patch):
if (block == 1) & (patch == 3):
return 4
# more options to return other values if required...
# elif (block == 2) & (patch == 4):
# return 5
# elif (block == 3) & (patch == 5):
# return 6
else:
return 0
# create a new column called 'RR' and populate with the value returned from the function above
df['Rep_rate'] = df.apply(lambda x: fillRep_rate(x['Block_No.'], x['Patch']), axis=1)
print(df)
输出:
Block_No. Patch Rep_rate
0 1 3 8
1 1 4 0
2 2 5 0
3 2 6 0
4 3 7 0
5 3 0 0
您可以为此使用 df.query()
。
例如:
假设这是您的数据框
Patch Block_No.
3 1
4 1
5 2
6 2
7 3
0 3
# add a new column and set values to 0
df['Rep_rate'] = 0
# run query and store index on a list
idx = df.query('Patch == 3 & Block_No. == 1').index.tolist()
# update values in Rep_rate column using this list
df.loc[idx, 'Rep_rate '] = 4