dataframe put 必须是一个unicode字符串,不是0,怎么给字符串而不是dataframe
dataframe put must be a unicode string, not 0, how give the string not the dataframe
我尝试操作一些数据框,我做了一个函数来计算两个城市之间的距离。
def find_distance(A,B):
key = '0377f0e6b42a47fe9d30a4e9a2b3bb63' # get api key from: https://opencagedata.com
geocoder = OpenCageGeocode(key)
result_A = geocoder.geocode(A)
lat_A = result_A[0]['geometry']['lat']
lng_A = result_A[0]['geometry']['lng']
result_B = geocoder.geocode(B)
lat_B = result_B[0]['geometry']['lat']
lng_B = result_B[0]['geometry']['lng']
return int(geodesic((lat_A,lng_A), (lat_B,lng_B)).kilometers)
这是我的数据框
2 32 Mulhouse 1874.0 2 797 16.8 16,3 € 10.012786
13 13 Saint-Étienne 1994.0 3 005 14.3 13,5 € 8.009882
39 39 Roubaix 2845.0 2 591 17.4 15,0 € 6.830968
27 27 Perpignan 2507.0 3 119 15.1 13,3 € 6.727255
40 40 Tourcoing 3089.0 2 901 17.5 15,3 € 6.327547
25 25 Limoges 2630.0 2 807 14.2 12,5 € 6.030424
20 20 Le Mans 2778.0 3 202 14.4 12,3 € 5.789559
有我的代码:
def clean_text(row):
# return the list of decoded cell in the Series instead
return [r.decode('unicode_escape').encode('ascii', 'ignore') for r in row]
def main():
inFile = "prix_m2_france.xlsx" #On ouvre l'excel
inSheetName = "Sheet1" #le nom de l excel
cols = ['Ville', 'Prix_moyen', 'Loyer_moyen'] #Les colomnes
df =(pd.read_excel(inFile, sheet_name = inSheetName))
df[cols] = df[cols].replace({'€': '', ",": ".", " ": "", "\u202f":""}, regex=True)
# df['Prix_moyen'] = df.apply(clean_text)
# df['Loyer_moyen'] = df.apply(clean_text)
df['Prix_moyen'] = df['Prix_moyen'].astype(float)
df['Loyer_moyen'] = df['Loyer_moyen'].astype(float)
# df["Prix_moyen"] += 1
df["revenu"] = (df['Loyer_moyen'] * 12) / (df["Prix_moyen"] * 1.0744) * 100
# df['Ville'].replace({'Le-Havre': 'Le Havre', 'Le-Mans': 'Le Mans'})
df["Ville"] = df['Ville'].replace(['Le-Havre', 'Le-Mans'], ['Le Havre', 'Le Mans'])
df["distance"] = find_distance("Paris", df["Ville"])
df2 = df.sort_values(by = 'revenu', ascending = False)
print(df2.head(90))
main()
df["distance"] = find_distance("Paris", df["Ville"]) 失败并给我这个错误:
opencage.geocoder.InvalidInputError:输入必须是unicode字符串,不能是0巴黎
1 马赛
2 里昂
3T
我把它想象成一个循环,我将在其中放置巴黎和城市之间的距离,但我想它会将所有数据框都放在我的第一个值上。
感谢您的帮助
(编辑,我只是粘贴了我的数据框的一部分)
您可以尝试类似的方法:
df["distance"] = [find_distance("Paris", city) for city in df["Ville"]]
我尝试操作一些数据框,我做了一个函数来计算两个城市之间的距离。
def find_distance(A,B):
key = '0377f0e6b42a47fe9d30a4e9a2b3bb63' # get api key from: https://opencagedata.com
geocoder = OpenCageGeocode(key)
result_A = geocoder.geocode(A)
lat_A = result_A[0]['geometry']['lat']
lng_A = result_A[0]['geometry']['lng']
result_B = geocoder.geocode(B)
lat_B = result_B[0]['geometry']['lat']
lng_B = result_B[0]['geometry']['lng']
return int(geodesic((lat_A,lng_A), (lat_B,lng_B)).kilometers)
这是我的数据框
2 32 Mulhouse 1874.0 2 797 16.8 16,3 € 10.012786
13 13 Saint-Étienne 1994.0 3 005 14.3 13,5 € 8.009882
39 39 Roubaix 2845.0 2 591 17.4 15,0 € 6.830968
27 27 Perpignan 2507.0 3 119 15.1 13,3 € 6.727255
40 40 Tourcoing 3089.0 2 901 17.5 15,3 € 6.327547
25 25 Limoges 2630.0 2 807 14.2 12,5 € 6.030424
20 20 Le Mans 2778.0 3 202 14.4 12,3 € 5.789559
有我的代码:
def clean_text(row):
# return the list of decoded cell in the Series instead
return [r.decode('unicode_escape').encode('ascii', 'ignore') for r in row]
def main():
inFile = "prix_m2_france.xlsx" #On ouvre l'excel
inSheetName = "Sheet1" #le nom de l excel
cols = ['Ville', 'Prix_moyen', 'Loyer_moyen'] #Les colomnes
df =(pd.read_excel(inFile, sheet_name = inSheetName))
df[cols] = df[cols].replace({'€': '', ",": ".", " ": "", "\u202f":""}, regex=True)
# df['Prix_moyen'] = df.apply(clean_text)
# df['Loyer_moyen'] = df.apply(clean_text)
df['Prix_moyen'] = df['Prix_moyen'].astype(float)
df['Loyer_moyen'] = df['Loyer_moyen'].astype(float)
# df["Prix_moyen"] += 1
df["revenu"] = (df['Loyer_moyen'] * 12) / (df["Prix_moyen"] * 1.0744) * 100
# df['Ville'].replace({'Le-Havre': 'Le Havre', 'Le-Mans': 'Le Mans'})
df["Ville"] = df['Ville'].replace(['Le-Havre', 'Le-Mans'], ['Le Havre', 'Le Mans'])
df["distance"] = find_distance("Paris", df["Ville"])
df2 = df.sort_values(by = 'revenu', ascending = False)
print(df2.head(90))
main()
df["distance"] = find_distance("Paris", df["Ville"]) 失败并给我这个错误:
opencage.geocoder.InvalidInputError:输入必须是unicode字符串,不能是0巴黎 1 马赛 2 里昂 3T
我把它想象成一个循环,我将在其中放置巴黎和城市之间的距离,但我想它会将所有数据框都放在我的第一个值上。
感谢您的帮助
(编辑,我只是粘贴了我的数据框的一部分)
您可以尝试类似的方法:
df["distance"] = [find_distance("Paris", city) for city in df["Ville"]]