如何使用 Python 从导入的 csv 计算 lat/long 点之间的距离?

How to use Python to calculate the distance between lat/long points from imported csv?

我正在尝试导入包含四列位置数据的 .csv (lat/long),计算点之间的距离,将距离写入新列,将函数循环到下一组坐标,并将输出数据框写入新的 .csv。我编写了以下代码。 完成这些步骤后出现错误。

示例数据:

lat1       lon1        lat2       lon2
33.58144   -57.73018   32.44873   -99.46281
25.46212   -46.62017   34.64971   -96.70271
39.97521   -80.27027   68.69710   -83.27182
42.74529   -73.73028   36.17318   -28.18201

代码:

import pandas as pd
import numpy as np
input_file = "input.csv"
output_file = "output.csv"
df = pd.read_csv(input_file)                       #Dataframe specification
df = df.convert_objects(convert_numeric = True)

def dist_from_coordinates(lat1, lon1, lat2, lon2):
  R = 6371  # Earth radius in km

  #conversion to radians
  d_lat = np.radians(lat2-lat1)
  d_lon = np.radians(lon2-lon1)

  r_lat1 = np.radians(lat1)
  r_lat2 = np.radians(lat2)

  #haversine formula
  a = np.sin(d_lat/2.) **2 + np.cos(r_lat1) * np.cos(r_lat2) * np.sin(d_lon/2.)**2

  haversine = 2 * R * np.arcsin(np.sqrt(a))

  return haversine

new_column = []                    #empty column for distance
for index,row in df.iterrows():
  lat1 = row['lat1'] #first row of location.lat column here
  lon1 = row['lon1'] #first row of location.long column here
  lat2 = row['lat2'] #second row of location.lat column here
  lon2 = row['lon2'] #second row of location.long column here
  value = dist_from_coordinates(lat1, lon1, lat2, lon2)  #get the distance
  new_column.append(value)   #append the empty list with distance values

df.insert(4,"Distance",new_column)  #4 is the index where you want to place your column. Column index starts with 0. "Distance" is the header and new_column are the values in the column.

with open(output_file,'ab') as f:
  df.to_csv(f,index = False)       #creates the output.csv

输出:

因此,经过操作后,output.csv 文件是一个单独的文件,其中包含所有前面的 4 列以及第 5 列,即距离。您可以使用 for 循环来执行此操作。我在这里展示的方法读取每一行并计算距离并将其附加到一个空列表中,该列表是新列“Distance”并最终创建 output.csv.

错误:

FutureWarning: convert_objects is deprecated.  To re-infer data dtypes for object columns, use DataFrame.infer_objects()
For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  after removing the cwd from sys.path.
TypeError                                 Traceback (most recent call last)
<ipython-input-8-ce103283fa0d> in <module>
     33 
     34 with open(output_file,'ab') as f:
---> 35   df.to_csv(f,index = False)       #creates the output.csv

~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
   3018                                  doublequote=doublequote,
   3019                                  escapechar=escapechar, decimal=decimal)
-> 3020         formatter.save()
   3021 
   3022         if path_or_buf is None:

~/anaconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py in save(self)
    170                 self.writer = UnicodeWriter(f, **writer_kwargs)
    171 
--> 172             self._save()
    173 
    174         finally:

~/anaconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save(self)
    272     def _save(self):
    273 
--> 274         self._save_header()
    275 
    276         nrows = len(self.data_index)

~/anaconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save_header(self)
    240         if not has_mi_columns or has_aliases:
    241             encoded_labels += list(write_cols)
--> 242             writer.writerow(encoded_labels)
    243         else:
    244             # write out the mi

TypeError: a bytes-like object is required, not 'str'

类似问题:

Link to Similar Problem

您应该应用以下更正:

而不是 df = df.convert_objects(convert_numeric = True)df[:] = df[:].apply(pd.to_numeric, errors='coerce')

此外,with open(output_file,'ab') as f: 您正在以二进制格式打开文件,您应该使用 with open(output_file,'w') as f:

那么应该可以了。