将列的值与变量进行比较并创建新列

comparing values of a column with a variable and make new column

我有一个这样的数据框:

           Patch  Last reward  First reward  Difference    Name  Block_No.
group_id                                                                 
1             3          0.0           0.0         0.0  XYZ          1
2             4         43.0          54.0        11.0  XYZ          1
3             5          0.0           0.0         0.0  XYZ          2
4             6         40.0          65.0        25.0  XYZ          2
5             7          0.0           0.0         0.0  XYZ          3
6             0          0.0           0.0         0.0  XYZ          3

我想根据以下条件创建一个名为 'Rep_rate' 的新列: 如果 block_no。 = 1 并且如果 patch = 3 ,则 Rep_rate = 4 ,否则 Rep_rate = 0.

我试过这样做:

if (df_last['Block_No.']) == 1:
            for i in range (len(df_last)):
                if df_last['Patch'][i] == 1: 
                    rep = 8
                else:
                    rep = 0
                df_last['Rep_Rate'] = rep

if (df_last['Block_No.']) == 2:
                for i in range (len(df_last)):
                    if df_last['Patch'][i] == 1: 
                        rep = 4
                    else:
                        rep = 0
                    df_last['Rep_Rate'] = rep

 if (df_last['Block_No.']) == 3:
                for i in range (len(df_last)):
                    if df_last['Patch'][i] == 1: 
                        rep = 8
                    else:
                        rep = 0                            
                    df_last['Rep_Rate'] = rep

然而,当我尝试这样做时,出现以下错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


   

我不明白您用来填充 RR 列的逻辑,但一种方法可能是:

df['RR'] = 0
if (df['Block'] == 1).all():
    for i, row in df.iterrows():
        if row['Patch'] == 3:
            df.loc[i,'RR'] = 4 # note using df.loc to directly edit the dataframe
elif (df['Block'] == x).all():
    for i, row in df.iterrows():
        if row['Patch'] == y:
            df.loc[i,'RR'] = z
# more if statements as needed

xyz 替换为您需要的任何值。

您遇到的问题是 if df["Block"] == 1。代码 df["Block"] == 1 会生成一个 True/False 的布尔系列,以确定系列中的每个值是否等于 1。不支持对系列使用 if 语句,因为含义不明确。 Pandas 提供 ser.any()ser.all() 用于明确的布尔值评估。 EG

if (df["Block"] ==1).all():
    # more code ...
如果 Block 列中的所有值都等于 1,

将计算 if 语句中的代码。

你可能需要修改一下逻辑(我也不是很懂)。但是,在行上循环并不是很好。所以...

尝试:

import pandas as pd

# create a test dataframe
df = pd.DataFrame({'Block_No.': [1,1,2,2,3,3], 'Patch': [3,4,5,6,7,0], })

# create a function to return 8 or 0 depending on the values of 'Block' and 'Patch'
def fillRep_rate(block, patch):
    if (block == 1) & (patch == 3):
        return 4
#     more options to return other values if required...
#     elif (block == 2) & (patch == 4):
#         return 5
#     elif (block == 3) & (patch == 5):
#         return 6
    else:
        return 0

# create a new column called 'RR' and populate with the value returned from the function above
df['Rep_rate'] = df.apply(lambda x: fillRep_rate(x['Block_No.'], x['Patch']), axis=1)

print(df)

输出:

    Block_No.   Patch   Rep_rate
0   1           3       8
1   1           4       0
2   2           5       0
3   2           6       0
4   3           7       0
5   3           0       0

您可以为此使用 df.query()。 例如: 假设这是您的数据框

          Patch     Block_No.
                                                                
             3         1
             4         1
             5         2
             6         2
             7         3
             0         3  
# add a new column and set values to 0
df['Rep_rate'] = 0

# run query and store index on a list
idx = df.query('Patch == 3 & Block_No. == 1').index.tolist()

# update values in Rep_rate column using this list
df.loc[idx, 'Rep_rate '] = 4