尝试根据每个数据帧中纬度和经度之间的差异来比较两个数据帧
Trying to compare two data frames based on difference between latitude and longitude in each data frame
我正在尝试比较两个数据框中的纬度和经度坐标。如果 latitude_fuze 中的差异小于 .01 latitude_air,并且 longitude_fuze 中的差异小于 .01 longitude_air,那么我想更新字段 df_result ['Type'] 阅读 'Airport'。基本上,我有一个带有机场纬度和经度坐标的 DF,如果这些坐标与我在我的业务 DF 中的纬度和经度坐标非常相似,我想向业务 DF 添加一个标志以表明这是一个机场。
这是我正在测试的代码。
lat1 = df_result['latitude_fuze']
lon1 = df_result['longitude_fuze']
lat2 = df_airports['latitude_air']
lon2 = df_airports['longitude_air']
fuze_rows=range(df_result.shape[0])
air_rows=range(df_airports.shape[0])
for r in fuze_rows:
lat = df_result.loc[r,lat1]
max_lat = lat + .01
min_lat = lat - .01
lon = df_result.loc[r,lon1]
max_lon = lon + .01
min_lon = lon - .01
for a in air_rows:
if (min_lat <= df_airports.loc[a,lat2] <= max_lat) and (min_lon <= df_airports.loc[a,lon2] <= max_lon):
df_result['Type'] = 'Airport'
这里有两个示例数据框:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'NY', 'New York', '40.76', '73.98'],
['NY', 'NY', 'New York', '40.76', '73.98']]
# Create the pandas DataFrame
df_result = pd.DataFrame(data, columns = ['state', 'city', 'county','latitude_fuze','longitude_fuze'])
# print dataframe.
df_result
还有...
data = [['New York', 'JFK', '40.64', '-73.78'],
['New York', 'JFK', '40.64', '-73.78'],
['Los Angeles', 'LAX', '33.94', '-118.41'],
['Chicago', 'ORD', '41.98', '-87.90'],
['San Francisco', 'SFO', '37.62', '-122.38']]
# Create the pandas DataFrame
df_airports = pd.DataFrame(data, columns = ['municipality_name', 'airport_code', 'latitude_air','longitude_air'])
# print dataframe.
df_airports
当运行这段代码时,我得到这个错误:
KeyError: "None of [Float64Index([40.719515, 40.719515, 40.719515, 40.75682, 40.75682, 40.75682,\n 40.75682, 40.75682, 40.75682, 40.7646,\n ...\n 40.0006, 40.0006, 40.0006, 40.0006, 40.0006, 40.0006,\n 40.0006, 39.742417, 39.742417, 39.742417],\n dtype='float64', length=1720)] are in the [index]"
如果使用 KNN 或 Haversine 方法进行计算更好,我愿意接受。我不是在这里寻找距离,而是寻找纬度和经度数字的相似之处。如果我确实需要计算距离才能使其正常工作,请告诉我。谢谢大家。
我不确定你需要采取什么方法,因为我不是 100% 清楚你想做什么。但是,类似这样的内容可能有助于让您当前的方法发挥作用:
# join the two dataframes - must be the same length
df = pd.concat([df_result, df_airports], axis=1)
# cast latitudes and longitudes to numeric
cols = ["latitude_fuze", "latitude_air", "longitude_fuze", "longitude_air"]
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)
# create a mask where our conditions are met (difference between lat fuze and lat air < 0.1 and difference between long fuze and long air < 0.1)
mask = ((abs(df["latitude_fuze"] - df["latitude_air"]) < 0.1) & (abs(df["longitude_fuze"] - df["longitude_air"]) < 0.1))
# fill the type column
df.loc[mask, 'Type'] = "Airport"
我正在尝试比较两个数据框中的纬度和经度坐标。如果 latitude_fuze 中的差异小于 .01 latitude_air,并且 longitude_fuze 中的差异小于 .01 longitude_air,那么我想更新字段 df_result ['Type'] 阅读 'Airport'。基本上,我有一个带有机场纬度和经度坐标的 DF,如果这些坐标与我在我的业务 DF 中的纬度和经度坐标非常相似,我想向业务 DF 添加一个标志以表明这是一个机场。
这是我正在测试的代码。
lat1 = df_result['latitude_fuze']
lon1 = df_result['longitude_fuze']
lat2 = df_airports['latitude_air']
lon2 = df_airports['longitude_air']
fuze_rows=range(df_result.shape[0])
air_rows=range(df_airports.shape[0])
for r in fuze_rows:
lat = df_result.loc[r,lat1]
max_lat = lat + .01
min_lat = lat - .01
lon = df_result.loc[r,lon1]
max_lon = lon + .01
min_lon = lon - .01
for a in air_rows:
if (min_lat <= df_airports.loc[a,lat2] <= max_lat) and (min_lon <= df_airports.loc[a,lon2] <= max_lon):
df_result['Type'] = 'Airport'
这里有两个示例数据框:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'Uniondale', 'Nassau', '40.72', '-73.59'],
['NY', 'NY', 'New York', '40.76', '73.98'],
['NY', 'NY', 'New York', '40.76', '73.98']]
# Create the pandas DataFrame
df_result = pd.DataFrame(data, columns = ['state', 'city', 'county','latitude_fuze','longitude_fuze'])
# print dataframe.
df_result
还有...
data = [['New York', 'JFK', '40.64', '-73.78'],
['New York', 'JFK', '40.64', '-73.78'],
['Los Angeles', 'LAX', '33.94', '-118.41'],
['Chicago', 'ORD', '41.98', '-87.90'],
['San Francisco', 'SFO', '37.62', '-122.38']]
# Create the pandas DataFrame
df_airports = pd.DataFrame(data, columns = ['municipality_name', 'airport_code', 'latitude_air','longitude_air'])
# print dataframe.
df_airports
当运行这段代码时,我得到这个错误:
KeyError: "None of [Float64Index([40.719515, 40.719515, 40.719515, 40.75682, 40.75682, 40.75682,\n 40.75682, 40.75682, 40.75682, 40.7646,\n ...\n 40.0006, 40.0006, 40.0006, 40.0006, 40.0006, 40.0006,\n 40.0006, 39.742417, 39.742417, 39.742417],\n dtype='float64', length=1720)] are in the [index]"
如果使用 KNN 或 Haversine 方法进行计算更好,我愿意接受。我不是在这里寻找距离,而是寻找纬度和经度数字的相似之处。如果我确实需要计算距离才能使其正常工作,请告诉我。谢谢大家。
我不确定你需要采取什么方法,因为我不是 100% 清楚你想做什么。但是,类似这样的内容可能有助于让您当前的方法发挥作用:
# join the two dataframes - must be the same length
df = pd.concat([df_result, df_airports], axis=1)
# cast latitudes and longitudes to numeric
cols = ["latitude_fuze", "latitude_air", "longitude_fuze", "longitude_air"]
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)
# create a mask where our conditions are met (difference between lat fuze and lat air < 0.1 and difference between long fuze and long air < 0.1)
mask = ((abs(df["latitude_fuze"] - df["latitude_air"]) < 0.1) & (abs(df["longitude_fuze"] - df["longitude_air"]) < 0.1))
# fill the type column
df.loc[mask, 'Type'] = "Airport"