如何将两个数据框中的列传递给 Haversine 函数?
How to pass columns in two data frames to Haversine Function?
我是经纬度方面的新手。我发现了一个看起来很有趣的 Haversine Function。我有两个数据框,我试图将它们输入到该函数中,但出现错误。
这是函数。
import numpy as np
lon1 = df["longitude_fuze"]
lat1 = df["latitude_fuze"]
lon2 = df["longitude_air"]
lat2 = df["latitude_air"]
# Haversine
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
我正在尝试将其添加到数据框中的列中,就像这样。
df['haversine_dist'] = haversine(lon1,lat1,lon2,lat2)
该函数编译正常,但当我尝试调用它时,出现此错误。
df['haversine_dist'] = haversine(lon1,lat1,lon2,lat2)
Traceback (most recent call last):
File "<ipython-input-38-cc7e470610ee>", line 1, in <module>
df['haversine_dist'] = haversine(lon1,lat1,lon2,lat2)
File "<ipython-input-37-f357b0fc2e88>", line 16, in haversine
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
File "C:\Users\ryans\anaconda3\lib\site-packages\pandas\core\series.py", line 129, in wrapper
raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>
这是我正在测试的两个数据框。
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'NY', 'New York', '40.76', '73.98'],
['NY', 'NY', 'New York', '40.76', '73.98']]
# Create the pandas DataFrame
df_result = pd.DataFrame(data, columns = ['state', 'city', 'county','latitude_fuze','longitude_fuze'])
# print dataframe.
df_result
data = [['New York', 'JFK', '40.63', '-73.60'],
['New York', 'JFK', '40.64', '-73.78'],
['Los Angeles', 'LAX', '33.94', '-118.41'],
['Chicago', 'ORD', '40.98', '73.90'],
['San Francisco', 'SFO', '40.62', '73.38']]
# Create the pandas DataFrame
df_airports = pd.DataFrame(data, columns = ['municipality_name', 'airport_code', 'latitude_air','longitude_air'])
# print dataframe.
df_airports
我在这个 link 找到了函数。
因为你传的是系列数据,你需要传单值..
# Below variables are going to have series data
lon1 = df["longitude_fuze"]
lat1 = df["latitude_fuze"]
lon2 = df["longitude_air"]
lat2 = df["latitude_air"]
相反,您可以选择特定索引处的值,例如,索引 0 处的值:
lon1 = df["longitude_fuze"].iloc[0]
lat1 = df["latitude_fuze"].iloc[0]
lon2 = df["longitude_air"].iloc[0]
lat2 = df["latitude_air"].iloc[0]
有了这些值,现在您可以调用您的函数了:
df['haversine_dist'] = haversine(lon1,lat1,lon2,lat2)
或者,如果您想评估这些列中所有值的值,您甚至可以在循环中执行此操作:
for i in df.index:
lon1 = df["longitude_fuze"].iloc[i]
lat1 = df["latitude_fuze"].iloc[i]
lon2 = df["longitude_air"].iloc[i]
lat2 = df["latitude_air"].iloc[i]
df.loc[i, 'haversine_dist'] = haversine(lon1,lat1,lon2,lat2)
我在这里看到两个问题:
经度和纬度在数据框中仍然是字符串,因此您可能 运行 遇到数据类型问题。
此处使用的 haversine
的实现不适用于经纬度类数组对象。
数据类型问题 可以通过 astype
轻松解决。例如,您可以使用 lon1 = df["longitude_fuze"].astype(float)
。或者更好的是,直接在数据框中更改类型:
dt_dict = {"longitude_fuze": float, "latitude_fuze": float,
"longitude_air": float, "latitude_air": float}
df = df.astype(dt_dict)
对于支持类数组参数的悬停正弦函数,因为它相当简单,我建议重新实现它,以便它与 numpy 兼容。我继续为你做了:
import numpy as np
def haversine_array(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(lambda x: x/360.*(2*np.pi), [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
放在一起:
import pandas as pd
import numpy as np
def haversine_array(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(lambda x: x/360.*(2*np.pi), [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
# initialize list of lists
data = [['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'NY', 'New York', '40.76', '73.98'],
['NY', 'NY', 'New York', '40.76', '73.98']]
# Create the pandas DataFrame
df_result = pd.DataFrame(data, columns = ['state', 'city', 'county','latitude_fuze','longitude_fuze'])
data = [['New York', 'JFK', '40.63', '-73.60'],
['New York', 'JFK', '40.64', '-73.78'],
['Los Angeles', 'LAX', '33.94', '-118.41'],
['Chicago', 'ORD', '40.98', '73.90'],
['San Francisco', 'SFO', '40.62', '73.38']]
df_airports = pd.DataFrame(data, columns = ['municipality_name', 'airport_code', 'latitude_air','longitude_air'])
# note the conversion to float
lon1 = df_result["longitude_fuze"].astype(float)
lat1 = df_result["latitude_fuze"].astype(float)
lon1 = df_result["longitude_fuze"].astype(float)
lon2 = df_airports['longitude_air'].astype(float)
lat2 = df_airports['latitude_air'].astype(float)
# using the haversine implementation above
df_result['haversine_dist'] = haversine_array(lon1, lat1, lon2, lat2)
现在您将获得:
>>> df_result['haversine_dist']
0 10.036708
1 18.314266
2 3987.270064
3 25.354970
4 52.895712
Name: haversine_dist, dtype: float64
希望对您有所帮助!
我是经纬度方面的新手。我发现了一个看起来很有趣的 Haversine Function。我有两个数据框,我试图将它们输入到该函数中,但出现错误。
这是函数。
import numpy as np
lon1 = df["longitude_fuze"]
lat1 = df["latitude_fuze"]
lon2 = df["longitude_air"]
lat2 = df["latitude_air"]
# Haversine
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
我正在尝试将其添加到数据框中的列中,就像这样。
df['haversine_dist'] = haversine(lon1,lat1,lon2,lat2)
该函数编译正常,但当我尝试调用它时,出现此错误。
df['haversine_dist'] = haversine(lon1,lat1,lon2,lat2)
Traceback (most recent call last):
File "<ipython-input-38-cc7e470610ee>", line 1, in <module>
df['haversine_dist'] = haversine(lon1,lat1,lon2,lat2)
File "<ipython-input-37-f357b0fc2e88>", line 16, in haversine
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
File "C:\Users\ryans\anaconda3\lib\site-packages\pandas\core\series.py", line 129, in wrapper
raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>
这是我正在测试的两个数据框。
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'NY', 'New York', '40.76', '73.98'],
['NY', 'NY', 'New York', '40.76', '73.98']]
# Create the pandas DataFrame
df_result = pd.DataFrame(data, columns = ['state', 'city', 'county','latitude_fuze','longitude_fuze'])
# print dataframe.
df_result
data = [['New York', 'JFK', '40.63', '-73.60'],
['New York', 'JFK', '40.64', '-73.78'],
['Los Angeles', 'LAX', '33.94', '-118.41'],
['Chicago', 'ORD', '40.98', '73.90'],
['San Francisco', 'SFO', '40.62', '73.38']]
# Create the pandas DataFrame
df_airports = pd.DataFrame(data, columns = ['municipality_name', 'airport_code', 'latitude_air','longitude_air'])
# print dataframe.
df_airports
我在这个 link 找到了函数。
因为你传的是系列数据,你需要传单值..
# Below variables are going to have series data
lon1 = df["longitude_fuze"]
lat1 = df["latitude_fuze"]
lon2 = df["longitude_air"]
lat2 = df["latitude_air"]
相反,您可以选择特定索引处的值,例如,索引 0 处的值:
lon1 = df["longitude_fuze"].iloc[0]
lat1 = df["latitude_fuze"].iloc[0]
lon2 = df["longitude_air"].iloc[0]
lat2 = df["latitude_air"].iloc[0]
有了这些值,现在您可以调用您的函数了:
df['haversine_dist'] = haversine(lon1,lat1,lon2,lat2)
或者,如果您想评估这些列中所有值的值,您甚至可以在循环中执行此操作:
for i in df.index:
lon1 = df["longitude_fuze"].iloc[i]
lat1 = df["latitude_fuze"].iloc[i]
lon2 = df["longitude_air"].iloc[i]
lat2 = df["latitude_air"].iloc[i]
df.loc[i, 'haversine_dist'] = haversine(lon1,lat1,lon2,lat2)
我在这里看到两个问题:
经度和纬度在数据框中仍然是字符串,因此您可能 运行 遇到数据类型问题。
此处使用的
haversine
的实现不适用于经纬度类数组对象。
数据类型问题 可以通过 astype
轻松解决。例如,您可以使用 lon1 = df["longitude_fuze"].astype(float)
。或者更好的是,直接在数据框中更改类型:
dt_dict = {"longitude_fuze": float, "latitude_fuze": float,
"longitude_air": float, "latitude_air": float}
df = df.astype(dt_dict)
对于支持类数组参数的悬停正弦函数,因为它相当简单,我建议重新实现它,以便它与 numpy 兼容。我继续为你做了:
import numpy as np
def haversine_array(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(lambda x: x/360.*(2*np.pi), [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
放在一起:
import pandas as pd
import numpy as np
def haversine_array(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(lambda x: x/360.*(2*np.pi), [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
# initialize list of lists
data = [['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'NY', 'New York', '40.76', '73.98'],
['NY', 'NY', 'New York', '40.76', '73.98']]
# Create the pandas DataFrame
df_result = pd.DataFrame(data, columns = ['state', 'city', 'county','latitude_fuze','longitude_fuze'])
data = [['New York', 'JFK', '40.63', '-73.60'],
['New York', 'JFK', '40.64', '-73.78'],
['Los Angeles', 'LAX', '33.94', '-118.41'],
['Chicago', 'ORD', '40.98', '73.90'],
['San Francisco', 'SFO', '40.62', '73.38']]
df_airports = pd.DataFrame(data, columns = ['municipality_name', 'airport_code', 'latitude_air','longitude_air'])
# note the conversion to float
lon1 = df_result["longitude_fuze"].astype(float)
lat1 = df_result["latitude_fuze"].astype(float)
lon1 = df_result["longitude_fuze"].astype(float)
lon2 = df_airports['longitude_air'].astype(float)
lat2 = df_airports['latitude_air'].astype(float)
# using the haversine implementation above
df_result['haversine_dist'] = haversine_array(lon1, lat1, lon2, lat2)
现在您将获得:
>>> df_result['haversine_dist']
0 10.036708
1 18.314266
2 3987.270064
3 25.354970
4 52.895712
Name: haversine_dist, dtype: float64
希望对您有所帮助!