从元素包含坐标列表的 pandas 列计算平均值
calculate averages from pandas column whose elements contain list of coordinates
我有一个包含一列 'geometry.coordinates' 的数据框,每行包含一个经纬度坐标对列表,示例:
[[[120.789558, 17.41699],
[120.761307, 17.416771],
[120.744881, 17.437571],
[120.745842, 17.44907],
[120.727699, 17.457621],
[120.73217, 17.463221],
[120.762817, 17.54215],
[120.791496, 17.54603],
[120.817032, 17.51009],
[120.884644, 17.469419],
[120.789223, 17.44525],
[120.789558, 17.41699]]]
我想创建另一个包含列表中所有纬度和经度的平均值的列。我该怎么做?
DataFrame 中的示例行:
+----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| | properties.NAME_2 | geometry.coordinates |
|----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | Sallapadan | [[[120.789558, 17.41699], [120.761307, 17.416771], [120.744881, 17.437571], [120.745842, 17.44907], [120.727699, 17.457621], [120.73217, 17.463221], [120.762817, 17.54215], [120.791496, 17.54603], [120.817032, 17.51009], [120.884644, 17.469419], [120.789223, 17.44525], [120.789558, 17.41699]]] |
| 1 | San Isidro | [[[120.630783, 17.43194], [120.578957, 17.44137], [120.584541, 17.476851], [120.584137, 17.48283], [120.605492, 17.502029], [120.615356, 17.494249], [120.672997, 17.49074], [120.673241, 17.46966], [120.618919, 17.46871], [120.621872, 17.446251], [120.630783, 17.43194]]] |
| 2 | San Juan | [[[120.782753, 17.71497], [120.779747, 17.66584], [120.724838, 17.665701], [120.707687, 17.687611], [120.712273, 17.711361], [120.711327, 17.726721], [120.721786, 17.732639], [120.750092, 17.724319], [120.785896, 17.76413], [120.811371, 17.740749], [120.782753, 17.71497]]] |
+----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
新专栏将是这样的:
Center
120.778018916667, 17.4642644166667
120.619734363636, 17.4669609090909
120.751865727273, 17.7135464545455
将 np.mean
与 axis=1
和 select 嵌套列表一起使用 [0]
:
df['geometry.coordinates']=df['geometry.coordinates'].apply(lambda x: np.mean(x, axis=1)[0])
print (df)
properties.NAME_2 geometry.coordinates
0 Sallapadan [120.77801891666665, 17.464264416666666]
1 San Isidro [120.61973436363637, 17.466960909090908]
2 San Juan [120.75186572727272, 17.713546454545455]
我有一个包含一列 'geometry.coordinates' 的数据框,每行包含一个经纬度坐标对列表,示例:
[[[120.789558, 17.41699],
[120.761307, 17.416771],
[120.744881, 17.437571],
[120.745842, 17.44907],
[120.727699, 17.457621],
[120.73217, 17.463221],
[120.762817, 17.54215],
[120.791496, 17.54603],
[120.817032, 17.51009],
[120.884644, 17.469419],
[120.789223, 17.44525],
[120.789558, 17.41699]]]
我想创建另一个包含列表中所有纬度和经度的平均值的列。我该怎么做?
DataFrame 中的示例行:
+----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| | properties.NAME_2 | geometry.coordinates |
|----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0 | Sallapadan | [[[120.789558, 17.41699], [120.761307, 17.416771], [120.744881, 17.437571], [120.745842, 17.44907], [120.727699, 17.457621], [120.73217, 17.463221], [120.762817, 17.54215], [120.791496, 17.54603], [120.817032, 17.51009], [120.884644, 17.469419], [120.789223, 17.44525], [120.789558, 17.41699]]] |
| 1 | San Isidro | [[[120.630783, 17.43194], [120.578957, 17.44137], [120.584541, 17.476851], [120.584137, 17.48283], [120.605492, 17.502029], [120.615356, 17.494249], [120.672997, 17.49074], [120.673241, 17.46966], [120.618919, 17.46871], [120.621872, 17.446251], [120.630783, 17.43194]]] |
| 2 | San Juan | [[[120.782753, 17.71497], [120.779747, 17.66584], [120.724838, 17.665701], [120.707687, 17.687611], [120.712273, 17.711361], [120.711327, 17.726721], [120.721786, 17.732639], [120.750092, 17.724319], [120.785896, 17.76413], [120.811371, 17.740749], [120.782753, 17.71497]]] |
+----+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
新专栏将是这样的:
Center
120.778018916667, 17.4642644166667
120.619734363636, 17.4669609090909
120.751865727273, 17.7135464545455
将 np.mean
与 axis=1
和 select 嵌套列表一起使用 [0]
:
df['geometry.coordinates']=df['geometry.coordinates'].apply(lambda x: np.mean(x, axis=1)[0])
print (df)
properties.NAME_2 geometry.coordinates
0 Sallapadan [120.77801891666665, 17.464264416666666]
1 San Isidro [120.61973436363637, 17.466960909090908]
2 San Juan [120.75186572727272, 17.713546454545455]