计算 pandas 数据框中的距离给出错误
Calculating distance in pandas data frame giving error
我有一个数据集包含 lat/long 四列中的两个点,并尝试使用 geopy.distance
.
计算新添加的列中它们之间的距离
如果我计算单个值但不适用于整个列,它工作正常。
import pandas as pd
from geopy import distance
sub_set = main[['Site_1','Site_Longitude_1','Site_Latitude_1','Site_2','Site_Longitude_2','Site_Latitude_2']]
lat1 = sub_set['Site_Latitude_1']
lat2 = sub_set['Site_Latitude_2']
long1 = sub_set['Site_Longitude_1']
long2 = sub_set['Site_Longitude_2']
数据框sub_set
如下
Site_1 Site_Longitude_1 Site_Latitude_1 Site_2 Site_Longitude_2 Site_Latitude_2
0 A -118.645167 34.237917 A2 -118.6499422 34.24973484
1 A -118.645167 34.237917 A2 -118.6499422 34.24973484
2 B -118.626659 34.224762 A2 -118.6499422 34.24973484
3 B -118.626659 34.224762 A2 -118.6499422 34.24973484
4 B -118.626659 34.224762 A2 -118.6499422 34.24973484
执行时,
sub_set['Distance'] = distance.distance((lat1,long1),(lat2,long2)).miles
抛出以下错误信息,
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
- 以下内容将为您提供所需的逐行计算。
- 不需要
subset
内容
- 这是一条很长的线,但它受益于所需列的绝对位置
df['Distance'] = df[['Site_Latitude_1', 'Site_Longitude_1', 'Site_Latitude_2', 'Site_Longitude_2']].apply(lambda x: distance.distance((x[0],x[1]), (x[2],x[3])).miles, axis=1)
更短的代码行
- 只需确保
x[]
已正确索引 df
中的正确列
df['Distance'] = df.apply(lambda x: distance.distance((x[2],x[1]), (x[5],x[4])).miles, axis=1)
输出:
Site_1 Site_Longitude_1 Site_Latitude_1 Site_2 Site_Longitude_2 Site_Latitude_2 Distance
0 A -118.645167 34.237917 A2 -118.6499422 34.24973484 0.859202
1 A -118.645167 34.237917 A2 -118.6499422 34.24973484 0.859202
2 B -118.626659 34.224762 A2 -118.6499422 34.24973484 2.177003
3 B -118.626659 34.224762 A2 -118.6499422 34.24973484 2.177003
4 B -118.626659 34.224762 A2 -118.6499422 34.24973484 2.177003
我是新手,但我想我可以提供帮助。
问题是因为您正在使用系列来处理需要单个值的方法。
您应该遍历行以 select 每个值 individualy。
试试这个代码:
for row in sub_set.index:
site1 =(sub_set.loc[row, 'Site_Latitude_1'],sub_set.loc[row, 'Site_Longitude_1'])
site2 =(sub_set.loc[row, 'Site_Latitude_2'],sub_set.loc[row, 'Site_Longitude_2'])
print('Distance is:',(distance.distance(site1, site2).miles),'miles')
输出:
Distance is: 0.8592022243334677 miles
Distance is: 0.8592022243334677 miles
Distance is: 2.1770033222544773 miles
Distance is: 2.1770033222544773 miles
Distance is: 2.1770033222544773 miles
或:
dist =[]
for row in sub_set.index:
site1 =(sub_set.loc[row, 'Site_Latitude_1'],sub_set.loc[row,
'Site_Longitude_1'])
site2 =(sub_set.loc[row, 'Site_Latitude_2'],sub_set.loc[row,
'Site_Longitude_2'])
dist.append((distance.distance(site1, site2).miles))
sub_set['Distance'] = dist
输出:
Site_1 Site_Longitude_1 Site_Latitude_1 Site_2 Site_Longitude_2 Site_Latitude_2 Distance
0 A -118.645167 34.237917 A2 -118.649942 34.24973 0.859202
1 A -118.645167 34.237917 A2 -118.649942 34.249735 0.859202
2 B -118.626659 34.224762 A2 -118.649942 34.249735 2.177003
3 B -118.626659 34.224762 A2 -118.649942 34.249735 2.177003
4 B -118.626659 34.224762 A2 -118.649942 34.249735 2.177003
我有一个数据集包含 lat/long 四列中的两个点,并尝试使用 geopy.distance
.
如果我计算单个值但不适用于整个列,它工作正常。
import pandas as pd
from geopy import distance
sub_set = main[['Site_1','Site_Longitude_1','Site_Latitude_1','Site_2','Site_Longitude_2','Site_Latitude_2']]
lat1 = sub_set['Site_Latitude_1']
lat2 = sub_set['Site_Latitude_2']
long1 = sub_set['Site_Longitude_1']
long2 = sub_set['Site_Longitude_2']
数据框sub_set
如下
Site_1 Site_Longitude_1 Site_Latitude_1 Site_2 Site_Longitude_2 Site_Latitude_2
0 A -118.645167 34.237917 A2 -118.6499422 34.24973484
1 A -118.645167 34.237917 A2 -118.6499422 34.24973484
2 B -118.626659 34.224762 A2 -118.6499422 34.24973484
3 B -118.626659 34.224762 A2 -118.6499422 34.24973484
4 B -118.626659 34.224762 A2 -118.6499422 34.24973484
执行时,
sub_set['Distance'] = distance.distance((lat1,long1),(lat2,long2)).miles
抛出以下错误信息,
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
- 以下内容将为您提供所需的逐行计算。
- 不需要
subset
内容 - 这是一条很长的线,但它受益于所需列的绝对位置
df['Distance'] = df[['Site_Latitude_1', 'Site_Longitude_1', 'Site_Latitude_2', 'Site_Longitude_2']].apply(lambda x: distance.distance((x[0],x[1]), (x[2],x[3])).miles, axis=1)
更短的代码行
- 只需确保
x[]
已正确索引df
中的正确列
df['Distance'] = df.apply(lambda x: distance.distance((x[2],x[1]), (x[5],x[4])).miles, axis=1)
输出:
Site_1 Site_Longitude_1 Site_Latitude_1 Site_2 Site_Longitude_2 Site_Latitude_2 Distance
0 A -118.645167 34.237917 A2 -118.6499422 34.24973484 0.859202
1 A -118.645167 34.237917 A2 -118.6499422 34.24973484 0.859202
2 B -118.626659 34.224762 A2 -118.6499422 34.24973484 2.177003
3 B -118.626659 34.224762 A2 -118.6499422 34.24973484 2.177003
4 B -118.626659 34.224762 A2 -118.6499422 34.24973484 2.177003
我是新手,但我想我可以提供帮助。
问题是因为您正在使用系列来处理需要单个值的方法。 您应该遍历行以 select 每个值 individualy。
试试这个代码:
for row in sub_set.index:
site1 =(sub_set.loc[row, 'Site_Latitude_1'],sub_set.loc[row, 'Site_Longitude_1'])
site2 =(sub_set.loc[row, 'Site_Latitude_2'],sub_set.loc[row, 'Site_Longitude_2'])
print('Distance is:',(distance.distance(site1, site2).miles),'miles')
输出:
Distance is: 0.8592022243334677 miles
Distance is: 0.8592022243334677 miles
Distance is: 2.1770033222544773 miles
Distance is: 2.1770033222544773 miles
Distance is: 2.1770033222544773 miles
或:
dist =[]
for row in sub_set.index:
site1 =(sub_set.loc[row, 'Site_Latitude_1'],sub_set.loc[row,
'Site_Longitude_1'])
site2 =(sub_set.loc[row, 'Site_Latitude_2'],sub_set.loc[row,
'Site_Longitude_2'])
dist.append((distance.distance(site1, site2).miles))
sub_set['Distance'] = dist
输出:
Site_1 Site_Longitude_1 Site_Latitude_1 Site_2 Site_Longitude_2 Site_Latitude_2 Distance
0 A -118.645167 34.237917 A2 -118.649942 34.24973 0.859202
1 A -118.645167 34.237917 A2 -118.649942 34.249735 0.859202
2 B -118.626659 34.224762 A2 -118.649942 34.249735 2.177003
3 B -118.626659 34.224762 A2 -118.649942 34.249735 2.177003
4 B -118.626659 34.224762 A2 -118.649942 34.249735 2.177003