如何将字符串转换为 pandas 中的弧度以计算两点之间的距离

How to convert string to radians in pandas to calculate distance between two points

我有一个数据框 df:

{'city': {0: 'Adak', 1: 'Akiachak', 2: 'Akiak', 3: 'Akutan', 4: 'Alakanuk'},
 'latitudedegrees': {0: '51.87957',
  1: '60.88981',
  2: '60.911865',
  3: '54.098693',
  4: '62.683391'},
 'latituderadians': {0: 0.9054693110188746,
  1: 1.0627276654137685,
  2: 1.0631125977802958,
  3: 0.9442003138756087,
  4: 1.094031559264981},
 'longitudedegrees': {0: '-176.63675',
  1: '-161.42393',
  2: '-161.22577',
  3: '-165.88176',
  4: '-164.65455'},
 'longituderadians': {0: -3.082892867522094,
  1: -2.8173790700088506,
  2: -2.8139205255630984,
  3: -2.8951828810030293,
  4: -2.8737640258896295},
 'ncity': {0: 'Dallas', 1: 'Dallas', 2: 'Dallas', 3: 'Dallas', 4: 'Dallas'},
 'nlatituderadians': {0: 0.5722195078367402,
  1: 0.5722195078367402,
  2: 0.5722195078367402,
  3: 0.5722195078367402,
  4: 0.5722195078367402},
 'nlongituderadians': {0: -1.6891776914122487,
  1: -1.6891776914122487,
  2: -1.6891776914122487,
  3: -1.6891776914122487,
  4: -1.6891776914122487},
 'nstate': {0: 'TX', 1: 'TX', 2: 'TX', 3: 'TX', 4: 'TX'},
 'state': {0: 'AK', 1: 'AK', 2: 'AK', 3: 'AK', 4: 'AK'},
 'zip': {0: '99546', 1: '99551', 2: '99552', 3: '99553', 4: '99554'}}

它是 'ncity' 列表的笛卡尔积,有几百万行。原文件在这里:

https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/export/

df 已经有弧度,但它们是作为字符串引入的,因此不要在此处 运行:

def distanceBetweenCityInMiles(lat1, long1, lat2, long2): # assumes latitudes and longitudes are in radians
    d = np.arccos(np.sin(lat1)*np.sin(lat2)+np.cos(lat1)*np.cos(lat2)*np.cos(long1-long2))
    distance_km = 6371 * d # distance_km ≈ radius_km * distance_radians ≈ 6371 * d, where 6371 km is the average radius of the earth
    distance_mi = distance_km * 0.621371
    return distance_mi

我试过转换为浮点数:

df[['nlatituderadians','nlongituderadians','latituderadians','longituderadians']]=df[['nlatituderadians','nlongituderadians','latituderadians','longituderadians']].astype(float)

但仍然出现此错误:

df['ncitydistance']= distanceBetweenCityInMiles('nlatituderadians', 'nlongituderadians', 'latituderadians', 'longituderadians')

TypeError: ufunc 'sin' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

如您所见,我将所有数据排成一行,需要计算 nlat/nlong 和 lat/long 值之间的距离。

如何通过距离函数将此字符串转换为弧度到 运行 数据?我假设这就是这行不通的原因。最终结果应该是给出城市之间距离的另一列。

您的函数有一些问题。您只有列的名称,但没有指定数据框。例如,不只是把列名变量 lat1,你必须把列名作为括号放在数据框旁边来实际调用列,而不仅仅是一个字符串:df[lat1]:

def distanceBetweenCityInMiles(df, lat1, long1, lat2, long2): # assumes latitudes and longitudes are in radians
    d = np.arccos(np.sin(df[lat1])*np.sin(df[lat2])+np.cos(df[lat1])*np.cos(df[lat2])*np.cos(df[long1]-df[long2]))
    distance_km = 6371 * d # distance_km ≈ radius_km * distance_radians ≈ 6371 * d, where 6371 km is the average radius of the earth
    distance_mi = distance_km * 0.621371
    return distance_mi


df['ncitydistance'] =  distanceBetweenCityInMiles(df, 'nlatituderadians', 'nlongituderadians', 'latituderadians', 'longituderadians')
df['ncitydistance']
Out[1]: 
0    4065.460680
1    3426.266819
2    3419.729121
3    3598.672064
4    3538.417833
Name: ncitydistance, dtype: float64