将具有度分秒 (DMS) 坐标的 pandas 数据框转换为十进制度
Converting pandas data frame with degree minute second (DMS) coordinates to decimal degrees
我有一个如下所示的 data frame
,我想将度、分、秒格式的 Latitude
和 Longitude
列转换为十进制度数 - 负数表示正确半球。有简单的方法吗?
Parent Company CPO PKO Latitude Longitude
Incasi Raya X 0°51'56.29"S 101°26'46.29"E
Incasi Raya X 1°23'39.29"S 101°35'30.45"E
Incasi Raya X 0°19'56.63"N 99°22'56.36"E
Incasi Raya X 0°21'45.91"N 99°37'59.68"E
Incasi Raya X 1°41'6.56"S 102°14'7.68"E
Incasi Raya X 1°15'2.13"S 101°34'30.38"E
Incasi Raya X 2°19'44.26"S 100°59'34.55"E
Musim Mas X 1°44'55.94"N 101°22'15.94"E
例如 0°51'56.29"S
将转换为 -0.8656361
根据 SO 中的函数回答我的问题,您可以这样做:
有趣的是,对于具有 +500 行的数据集,此答案的速度也是 MaxU 和 Amis 答案的 2 倍。我敢打赌,瓶颈是 str.extract()。但显然有些事情很奇怪。
import pandas as pd
import re
#
def dms2dd(s):
# example: s = """0°51'56.29"S"""
degrees, minutes, seconds, direction = re.split('[°\'"]+', s)
dd = float(degrees) + float(minutes)/60 + float(seconds)/(60*60);
if direction in ('S','W'):
dd*= -1
return dd
df = pd.DataFrame({'CPO': {0: 'Raya', 1: 'Raya'},
'Latitude': {0: '0°51\'56.29"S', 1: '1°23\'39.29"S'},
'Longitude': {0: '101°26\'46.29"E', 1: '101°35\'30.45"E'},
'PKO': {0: 'X', 1: 'X'},
'ParentCompany': {0: 'Incasi', 1: 'Incasi'}})
df['Latitude'] = df['Latitude'].apply(dms2dd)
df['Longitude'] = df['Longitude'].apply(dms2dd)
打印 df returns:
CPO Latitude Longitude PKO ParentCompany
0 Raya -0.865636 101.446192 X Incasi
1 Raya -1.394247 101.591792 X Incasi
更新: 要更正您的错误,您可以按照以下行进行操作:
m = df['Latitude'].str[-2] != '"'
df.loc[m, 'Latitude'] = df.loc[m, 'Latitude'].str[:-1] + '"' + df.loc[m, 'Latitude'].str[-1]
完整示例:
import re
s1 = """0°51'56.29"S"""
s2 = """0°51'56.29S"""
df = pd.Series((s1,s2)).to_frame(name='Latitude')
m = df['Latitude'].str[-2] != '"'
df.loc[m, 'Latitude'] = df.loc[m, 'Latitude'].str[:-1] + '"' + df.loc[m, 'Latitude'].str[-1]
print(df)
这是一个矢量化方法,它也使用 matrix
* vector
([1, 1./60, 1./3600]
) 乘法:
In [233]: %paste
def dms2dec(s):
x = (s.str.upper()
.str.split(r'[°\'"]', expand=True)
.replace(['S','W','N','E'], [-1,-1,1,1])
.astype('float'))
return x.iloc[:, :3].dot([1, 1./60, 1./3600]).mul(x.iloc[:, 3])
## -- End pasted text --
In [234]: df[['Latitude','Longitude']] = df[['Latitude','Longitude']].apply(dms2dec)
In [235]: df
Out[235]:
Parent Company CPO PKO Latitude Longitude
0 Incasi Raya X -0.865636 101.446192
1 Incasi Raya X -1.394247 101.591792
2 Incasi Raya X 0.332397 99.382322
3 Incasi Raya X 0.362753 99.633244
4 Incasi Raya X -1.685156 102.235467
5 Incasi Raya X -1.250592 101.575106
6 Incasi Raya X -2.328961 100.992931
7 Musim Mas X 1.748872 101.371094
分步说明:
In [239]: x = (s.str.upper()
...: .str.split(r'[°\'"]', expand=True)
...: .replace(['S','W','N','E'], [-1,-1,1,1])
...: .astype('float'))
In [240]: x
Out[240]:
0 1 2 3
0 0.0 51.0 56.29 -1.0
1 1.0 23.0 39.29 -1.0
2 0.0 19.0 56.63 1.0
3 0.0 21.0 45.91 1.0
4 1.0 41.0 6.56 -1.0
5 1.0 15.0 2.13 -1.0
6 2.0 19.0 44.26 -1.0
7 1.0 44.0 55.94 1.0
In [241]: x.iloc[:, :3].dot([1, 1./60, 1./3600])
Out[241]:
0 0.865636
1 1.394247
2 0.332397
3 0.362753
4 1.685156
5 1.250592
6 2.328961
7 1.748872
dtype: float64
In [242]: x.iloc[:, :3].dot([1, 1./60, 1./3600]).mul(x.iloc[:, 3])
Out[242]:
0 -0.865636
1 -1.394247
2 0.332397
3 0.362753
4 -1.685156
5 -1.250592
6 -2.328961
7 1.748872
dtype: float64
您可以使用 pd.Series.str.extract
进行矢量化运算。对于纬度,例如:
parts = df.Latitude.str.extract('(\d+)°(\d+)\'([^"]+)"([N|S|E|W])', expand=True)
>>> (parts[0].astype(int) + parts[1].astype(float) / 60 + parts[2].astype(float) / 3600) * parts[3].map({'N':1, 'S':-1, 'E': 1, 'W':-1})
0 101.446192
1 101.591792
2 99.382322
3 99.633244
4 102.235467
5 101.575106
6 100.992931
7 101.371094
您可以使用函数clean_lat_long()
from the library DataPrep。使用 pip install dataprep
.
安装
from dataprep.clean import clean_lat_long
df = pd.DataFrame({"Latitude": ["0°51'56.29''S", "1°23'39.29''S", "0°19'56.63''N"],
"Longitude": ["101°26'46.29''E", "101°35'30.45''E", "99°22'56.36''E"]})
df2 = clean_lat_long(df, lat_col="Latitude", long_col="Longitude", split=True)
df2
Latitude Longitude Latitude_clean Longitude_clean
0 0°51'56.29''S 101°26'46.29''E -0.8656 101.4462
1 1°23'39.29''S 101°35'30.45''E -1.3942 101.5918
2 0°19'56.63''N 99°22'56.36''E 0.3324 99.3823
我有一个如下所示的 data frame
,我想将度、分、秒格式的 Latitude
和 Longitude
列转换为十进制度数 - 负数表示正确半球。有简单的方法吗?
Parent Company CPO PKO Latitude Longitude
Incasi Raya X 0°51'56.29"S 101°26'46.29"E
Incasi Raya X 1°23'39.29"S 101°35'30.45"E
Incasi Raya X 0°19'56.63"N 99°22'56.36"E
Incasi Raya X 0°21'45.91"N 99°37'59.68"E
Incasi Raya X 1°41'6.56"S 102°14'7.68"E
Incasi Raya X 1°15'2.13"S 101°34'30.38"E
Incasi Raya X 2°19'44.26"S 100°59'34.55"E
Musim Mas X 1°44'55.94"N 101°22'15.94"E
例如 0°51'56.29"S
将转换为 -0.8656361
根据 SO 中的函数回答我的问题,您可以这样做:
有趣的是,对于具有 +500 行的数据集,此答案的速度也是 MaxU 和 Amis 答案的 2 倍。我敢打赌,瓶颈是 str.extract()。但显然有些事情很奇怪。
import pandas as pd
import re
#
def dms2dd(s):
# example: s = """0°51'56.29"S"""
degrees, minutes, seconds, direction = re.split('[°\'"]+', s)
dd = float(degrees) + float(minutes)/60 + float(seconds)/(60*60);
if direction in ('S','W'):
dd*= -1
return dd
df = pd.DataFrame({'CPO': {0: 'Raya', 1: 'Raya'},
'Latitude': {0: '0°51\'56.29"S', 1: '1°23\'39.29"S'},
'Longitude': {0: '101°26\'46.29"E', 1: '101°35\'30.45"E'},
'PKO': {0: 'X', 1: 'X'},
'ParentCompany': {0: 'Incasi', 1: 'Incasi'}})
df['Latitude'] = df['Latitude'].apply(dms2dd)
df['Longitude'] = df['Longitude'].apply(dms2dd)
打印 df returns:
CPO Latitude Longitude PKO ParentCompany
0 Raya -0.865636 101.446192 X Incasi
1 Raya -1.394247 101.591792 X Incasi
更新: 要更正您的错误,您可以按照以下行进行操作:
m = df['Latitude'].str[-2] != '"'
df.loc[m, 'Latitude'] = df.loc[m, 'Latitude'].str[:-1] + '"' + df.loc[m, 'Latitude'].str[-1]
完整示例:
import re
s1 = """0°51'56.29"S"""
s2 = """0°51'56.29S"""
df = pd.Series((s1,s2)).to_frame(name='Latitude')
m = df['Latitude'].str[-2] != '"'
df.loc[m, 'Latitude'] = df.loc[m, 'Latitude'].str[:-1] + '"' + df.loc[m, 'Latitude'].str[-1]
print(df)
这是一个矢量化方法,它也使用 matrix
* vector
([1, 1./60, 1./3600]
) 乘法:
In [233]: %paste
def dms2dec(s):
x = (s.str.upper()
.str.split(r'[°\'"]', expand=True)
.replace(['S','W','N','E'], [-1,-1,1,1])
.astype('float'))
return x.iloc[:, :3].dot([1, 1./60, 1./3600]).mul(x.iloc[:, 3])
## -- End pasted text --
In [234]: df[['Latitude','Longitude']] = df[['Latitude','Longitude']].apply(dms2dec)
In [235]: df
Out[235]:
Parent Company CPO PKO Latitude Longitude
0 Incasi Raya X -0.865636 101.446192
1 Incasi Raya X -1.394247 101.591792
2 Incasi Raya X 0.332397 99.382322
3 Incasi Raya X 0.362753 99.633244
4 Incasi Raya X -1.685156 102.235467
5 Incasi Raya X -1.250592 101.575106
6 Incasi Raya X -2.328961 100.992931
7 Musim Mas X 1.748872 101.371094
分步说明:
In [239]: x = (s.str.upper()
...: .str.split(r'[°\'"]', expand=True)
...: .replace(['S','W','N','E'], [-1,-1,1,1])
...: .astype('float'))
In [240]: x
Out[240]:
0 1 2 3
0 0.0 51.0 56.29 -1.0
1 1.0 23.0 39.29 -1.0
2 0.0 19.0 56.63 1.0
3 0.0 21.0 45.91 1.0
4 1.0 41.0 6.56 -1.0
5 1.0 15.0 2.13 -1.0
6 2.0 19.0 44.26 -1.0
7 1.0 44.0 55.94 1.0
In [241]: x.iloc[:, :3].dot([1, 1./60, 1./3600])
Out[241]:
0 0.865636
1 1.394247
2 0.332397
3 0.362753
4 1.685156
5 1.250592
6 2.328961
7 1.748872
dtype: float64
In [242]: x.iloc[:, :3].dot([1, 1./60, 1./3600]).mul(x.iloc[:, 3])
Out[242]:
0 -0.865636
1 -1.394247
2 0.332397
3 0.362753
4 -1.685156
5 -1.250592
6 -2.328961
7 1.748872
dtype: float64
您可以使用 pd.Series.str.extract
进行矢量化运算。对于纬度,例如:
parts = df.Latitude.str.extract('(\d+)°(\d+)\'([^"]+)"([N|S|E|W])', expand=True)
>>> (parts[0].astype(int) + parts[1].astype(float) / 60 + parts[2].astype(float) / 3600) * parts[3].map({'N':1, 'S':-1, 'E': 1, 'W':-1})
0 101.446192
1 101.591792
2 99.382322
3 99.633244
4 102.235467
5 101.575106
6 100.992931
7 101.371094
您可以使用函数clean_lat_long()
from the library DataPrep。使用 pip install dataprep
.
from dataprep.clean import clean_lat_long
df = pd.DataFrame({"Latitude": ["0°51'56.29''S", "1°23'39.29''S", "0°19'56.63''N"],
"Longitude": ["101°26'46.29''E", "101°35'30.45''E", "99°22'56.36''E"]})
df2 = clean_lat_long(df, lat_col="Latitude", long_col="Longitude", split=True)
df2
Latitude Longitude Latitude_clean Longitude_clean
0 0°51'56.29''S 101°26'46.29''E -0.8656 101.4462
1 1°23'39.29''S 101°35'30.45''E -1.3942 101.5918
2 0°19'56.63''N 99°22'56.36''E 0.3324 99.3823