如何向使用函数检查两列的值是否符合特定条件的数据框添加新列?
How can I add a new column to dataframe that uses a function to check if values from two columns fit specific criteria?
我有一个类似于下面的数据框:
df = pd.DataFrame({'col_1': [1.01,-2.02,None], 'col_2': [1.01,-2.02,None]}, columns=['col_1', 'col_2'])
您可以将 col_1 和 col_2 分别视为 x 坐标和 y 坐标。我需要能够根据坐标平面上的三个三角形检查此数据框的每一行,并向数据框添加一列 'col_3',告诉我该点位于三个三角形中的哪一个。
例如,在索引 0 处,我们看到点 (1.01, 1.01)。如果我有
'triangle 1' 点 (0.0, 0.0), (0.0, 2.1), (3.2, 0.0)
,
'triangle 2' 点 (0.0, 0.0), (0.0, -3.1), (-3.1, 0.0)
和
'triangle 3' 点 (3.5, 0.0), (3.5, 3.5), (4.5, 0.0)
,
那么索引 0 (1.01, 1.01)
处的点将位于 'triangle 1' 内,而索引 0 的 col_3 将具有值 'triangle 1'。索引 1 (-2.02, -2.02)
的行将落在 'triangle 2' 内,索引 2 将是 None 或空值,因为那里没有点。
我找到了一组函数,可以很好地确定一个点是否位于三角形内,我只是不确定如何将所有内容联系在一起:
# A utility function to calculate area of triangle formed by (x1, y1), (x2, y2) and (x3, y3)
def area(x1, y1, x2, y2, x3, y3):
return abs((x1 * (y2 - y3) + x2 * (y3 - y1) + x3 * (y1 - y2)) / 2.0)
# A function to check whether point P(x, y) lies inside the triangle formed by A(x1, y1), B(x2, y2) and C(x3, y3)
def isInside(x1, y1, x2, y2, x3, y3, x, y):
# Calculate area of triangle ABC
A = area (x1, y1, x2, y2, x3, y3)
# Calculate area of triangle PBC
A1 = area (x, y, x2, y2, x3, y3)
# Calculate area of triangle PAC
A2 = area (x1, y1, x, y, x3, y3)
# Calculate area of triangle PAB
A3 = area (x1, y1, x2, y2, x, y)
# Check if sum of A1, A2 and A3 is same as A
if(A == A1 + A2 + A3):
return True
else:
return False
# Driver program to test above function
# Let us check whether the point P(10, 15) lies inside the triangle formed by A(0, 0), B(20, 0) and C(10, 30)
if (isInside(0, 0, 20, 0, 10, 30, 10, 15)):
print('Inside')
else:
print('Not Inside')
在上面的 isInside
函数中,每个三角形的前 6 个参数不同,最后 2 个参数应该是每行的 col_1
和 col_2
值。我尝试了一些“如果”条件混乱但最终得到
ValueError: ('The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().' and ValueError("The truth
value of a {0} is ambiguous.").
如有任何帮助,我们将不胜感激!
你可以试试这个:
def whereIsIt(row):
x = row['col_1']
y = row['col_2']
if x is None or y is None:
return None
#(0.0, 0.0), (0.0, 2.1), (3.2, 0.0)
if isInside(0.0,0.0,0.0,2.1,3.2,0.0,x,y):
return 1
# (0.0, 0.0), (0.0, -3.1), (-3.1, 0.0)
elif isInside(0.0,0.0,0.0,-3.1,-3.1,0.0,x,y):
return 2
#(3.5, 0.0), (3.5, 3.5), (4.5, 0.0)
elif isInside(3.5,0.0,3.5,3.5,4.5,0.0,x,y):
return 3
else:
return None
df['col_3']=df.apply(lambda row: whereIsIt(row),axis=1)
df.head()
顺便说一句,输出是:
col_1 col_2 col_3
0 1.01 1.01 1.0
1 -2.02 -2.02 NaN
2 NaN NaN NaN
(-2.02, -2.02) 不在 'triangle 2' 内,点 (0.0, 0.0), (0.0, -3.1), (-3.1, 0.0),或者你提供的函数是错误的. :)
我有一个类似于下面的数据框:
df = pd.DataFrame({'col_1': [1.01,-2.02,None], 'col_2': [1.01,-2.02,None]}, columns=['col_1', 'col_2'])
您可以将 col_1 和 col_2 分别视为 x 坐标和 y 坐标。我需要能够根据坐标平面上的三个三角形检查此数据框的每一行,并向数据框添加一列 'col_3',告诉我该点位于三个三角形中的哪一个。
例如,在索引 0 处,我们看到点 (1.01, 1.01)。如果我有
'triangle 1' 点 (0.0, 0.0), (0.0, 2.1), (3.2, 0.0)
,
'triangle 2' 点 (0.0, 0.0), (0.0, -3.1), (-3.1, 0.0)
和
'triangle 3' 点 (3.5, 0.0), (3.5, 3.5), (4.5, 0.0)
,
那么索引 0 (1.01, 1.01)
处的点将位于 'triangle 1' 内,而索引 0 的 col_3 将具有值 'triangle 1'。索引 1 (-2.02, -2.02)
的行将落在 'triangle 2' 内,索引 2 将是 None 或空值,因为那里没有点。
我找到了一组函数,可以很好地确定一个点是否位于三角形内,我只是不确定如何将所有内容联系在一起:
# A utility function to calculate area of triangle formed by (x1, y1), (x2, y2) and (x3, y3)
def area(x1, y1, x2, y2, x3, y3):
return abs((x1 * (y2 - y3) + x2 * (y3 - y1) + x3 * (y1 - y2)) / 2.0)
# A function to check whether point P(x, y) lies inside the triangle formed by A(x1, y1), B(x2, y2) and C(x3, y3)
def isInside(x1, y1, x2, y2, x3, y3, x, y):
# Calculate area of triangle ABC
A = area (x1, y1, x2, y2, x3, y3)
# Calculate area of triangle PBC
A1 = area (x, y, x2, y2, x3, y3)
# Calculate area of triangle PAC
A2 = area (x1, y1, x, y, x3, y3)
# Calculate area of triangle PAB
A3 = area (x1, y1, x2, y2, x, y)
# Check if sum of A1, A2 and A3 is same as A
if(A == A1 + A2 + A3):
return True
else:
return False
# Driver program to test above function
# Let us check whether the point P(10, 15) lies inside the triangle formed by A(0, 0), B(20, 0) and C(10, 30)
if (isInside(0, 0, 20, 0, 10, 30, 10, 15)):
print('Inside')
else:
print('Not Inside')
在上面的 isInside
函数中,每个三角形的前 6 个参数不同,最后 2 个参数应该是每行的 col_1
和 col_2
值。我尝试了一些“如果”条件混乱但最终得到
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().' and ValueError("The truth value of a {0} is ambiguous.").
如有任何帮助,我们将不胜感激!
你可以试试这个:
def whereIsIt(row):
x = row['col_1']
y = row['col_2']
if x is None or y is None:
return None
#(0.0, 0.0), (0.0, 2.1), (3.2, 0.0)
if isInside(0.0,0.0,0.0,2.1,3.2,0.0,x,y):
return 1
# (0.0, 0.0), (0.0, -3.1), (-3.1, 0.0)
elif isInside(0.0,0.0,0.0,-3.1,-3.1,0.0,x,y):
return 2
#(3.5, 0.0), (3.5, 3.5), (4.5, 0.0)
elif isInside(3.5,0.0,3.5,3.5,4.5,0.0,x,y):
return 3
else:
return None
df['col_3']=df.apply(lambda row: whereIsIt(row),axis=1)
df.head()
顺便说一句,输出是:
col_1 col_2 col_3
0 1.01 1.01 1.0
1 -2.02 -2.02 NaN
2 NaN NaN NaN
(-2.02, -2.02) 不在 'triangle 2' 内,点 (0.0, 0.0), (0.0, -3.1), (-3.1, 0.0),或者你提供的函数是错误的. :)