如何向使用函数检查两列的值是否符合特定条件的数据框添加新列?

How can I add a new column to dataframe that uses a function to check if values from two columns fit specific criteria?

我有一个类似于下面的数据框:

df = pd.DataFrame({'col_1': [1.01,-2.02,None], 'col_2': [1.01,-2.02,None]}, columns=['col_1', 'col_2'])

您可以将 col_1 和 col_2 分别视为 x 坐标和 y 坐标。我需要能够根据坐标平面上的三个三角形检查此数据框的每一行,并向数据框添加一列 'col_3',告诉我该点位于三个三角形中的哪一个。

例如,在索引 0 处,我们看到点 (1.01, 1.01)。如果我有

'triangle 1' 点 (0.0, 0.0), (0.0, 2.1), (3.2, 0.0),

'triangle 2' 点 (0.0, 0.0), (0.0, -3.1), (-3.1, 0.0)

'triangle 3' 点 (3.5, 0.0), (3.5, 3.5), (4.5, 0.0),

那么索引 0 (1.01, 1.01) 处的点将位于 'triangle 1' 内,而索引 0 的 col_3 将具有值 'triangle 1'。索引 1 (-2.02, -2.02) 的行将落在 'triangle 2' 内,索引 2 将是 None 或空值,因为那里没有点。

我找到了一组函数,可以很好地确定一个点是否位于三角形内,我只是不确定如何将所有内容联系在一起:

# A utility function to calculate area of triangle formed by (x1, y1), (x2, y2) and (x3, y3)
def area(x1, y1, x2, y2, x3, y3):
    return abs((x1 * (y2 - y3) + x2 * (y3 - y1) + x3 * (y1 - y2)) / 2.0)

# A function to check whether point P(x, y) lies inside the triangle formed by A(x1, y1), B(x2, y2) and C(x3, y3)
def isInside(x1, y1, x2, y2, x3, y3, x, y):
    # Calculate area of triangle ABC
    A = area (x1, y1, x2, y2, x3, y3)
    # Calculate area of triangle PBC
    A1 = area (x, y, x2, y2, x3, y3)
    # Calculate area of triangle PAC
    A2 = area (x1, y1, x, y, x3, y3)
    # Calculate area of triangle PAB
    A3 = area (x1, y1, x2, y2, x, y)

    # Check if sum of A1, A2 and A3 is same as A
    if(A == A1 + A2 + A3):
        return True
    else:
        return False
# Driver program to test above function
# Let us check whether the point P(10, 15) lies inside the triangle formed by A(0, 0), B(20, 0) and C(10, 30) 
if (isInside(0, 0, 20, 0, 10, 30, 10, 15)):
    print('Inside')
else:
    print('Not Inside')

在上面的 isInside 函数中,每个三角形的前 6 个参数不同,最后 2 个参数应该是每行的 col_1col_2 值。我尝试了一些“如果”条件混乱但最终得到

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().' and ValueError("The truth value of a {0} is ambiguous.").

如有任何帮助,我们将不胜感激!

你可以试试这个:

def whereIsIt(row):
    x = row['col_1']
    y = row['col_2']
    if x is None or y is None:
        return None
    #(0.0, 0.0), (0.0, 2.1), (3.2, 0.0)
    if isInside(0.0,0.0,0.0,2.1,3.2,0.0,x,y):
        return 1
    # (0.0, 0.0), (0.0, -3.1), (-3.1, 0.0)
    elif isInside(0.0,0.0,0.0,-3.1,-3.1,0.0,x,y):
        return 2
    #(3.5, 0.0), (3.5, 3.5), (4.5, 0.0)
    elif isInside(3.5,0.0,3.5,3.5,4.5,0.0,x,y):
        return 3
    else:
        return None

df['col_3']=df.apply(lambda row: whereIsIt(row),axis=1)
df.head()

顺便说一句,输出是:

    col_1   col_2   col_3
0   1.01    1.01    1.0
1   -2.02   -2.02   NaN
2   NaN NaN NaN

(-2.02, -2.02) 不在 'triangle 2' 内,点 (0.0, 0.0), (0.0, -3.1), (-3.1, 0.0),或者你提供的函数是错误的. :)