如何将一列中的数据拆分为Python中的一些单独的列?
How to split data in a column into some separate columns in Python?
所以,我有一个数据框如下:
import pandas as pd
df = pd.DataFrame(
{
"id": [8233037, 8233313],
"geometry": [
"{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}",
"{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}",
],
}
)
df
我想在 og 数据帧中分别包含一些单行字符串的结果,例如 [107.625764, -6.910353]、[107.625871, -6.910358],拆分为 107.625764, -6.910353 。预期结果的详细信息如下图所示。
Expected Results
据我所知,我们可以应用 str.split 方法并指定任何特定的分隔符。方法如下:
df[
["coordinate1", "coordinate2", "coordinate3", "coordinate4", "coordinate-n"]
] = df.geometry.str.split(
" ",
expand=True,
)
问题:我不知道应该在 str.split(" ").
之后放置正确的分隔符
如何操作 DataFrame 列中的值,直到得到预期的 table,如下图所示?
Expected Results
我有一个纯 python 的解决方案。
首先geometry
是一个类似JSON的字符串,但是它有语法错误关键是单引号,JSON需要双引号,所以我用yaml
解析它。
然后我只需要格式化它来列出数据
import yaml
df = pd.DataFrame({ 'id':[8233037,8233313],
'geometry': ["{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}","{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}"]})
data = []
for _,row in df.iterrows():
id = row['id']
geo = yaml.load(row['geometry'])['coordinates']
geos = []
for g in geo:
geos += g
data += [[id,g[0],g[1]] for g in geos]
df_new = pd.DataFrame(data,columns=['id','latitude','longtitude'])
df_new
id latitude longtitude
0 8233037 107.612018 -6.921755
1 8233037 107.611888 -6.923030
2 8233037 107.611715 -6.924730
3 8233037 107.611715 -6.924890
4 8233037 107.611729 -6.925015
... ... ... ...
199 8233313 107.623747 -6.909832
200 8233313 107.625456 -6.910273
201 8233313 107.625764 -6.910353
202 8233313 107.625871 -6.910358
203 8233313 107.626035 -6.910264
204 rows × 3 columns
geometry
看起来像 JSON
字符串,所以首先我会使用模块 json
将它从字符串转换为普通 list/dict。晚些时候你可以更简单地访问值。
但它不正确 JSON
所以我可以为此使用模块 dirtyjson
df['data'] = df['geometry'].apply(lambda row:dirtyjson.loads(row))
print(df['data'])
或者(幸运的是)我可以用 "
替换 '
以获得正确的 JSON
df['data'] = df['geometry'].apply(lambda row:json.loads(row.replace("'", '"')))
结果
0 {'type': 'MultiLineString', 'coordinates': [[[...
1 {'type': 'MultiLineString', 'coordinates': [[[...
Name: data, dtype: object
接下来我只得到coordinates
df['coordinates'] = df['data'].apply(lambda row:row['coordinates'])
print(df['coordinates'])
结果
0 [[[107.612018, -6.921755], [107.611888, -6.923...
1 [[[107.614077, -6.91033], [107.614837, -6.9100...
Name: coordinates, dtype: object
它是嵌套列表,所以我把它压扁了
def flatten(row):
result = []
for item in row:
result += item
return result
df['coordinates'] = df['coordinates'].apply(flatten)
print(df['coordinates'])
或者我可以使用 sum()
和 []
作为起始值
df['coordinates'] = df['coordinates'].apply(lambda row: sum(row, []))
结果
0 [[107.612018, -6.921755], [107.611888, -6.9230...
1 [[107.614077, -6.91033], [107.614837, -6.91005...
Name: coordinates, dtype: object
现在我可以 explode
将每一对放在单独的行中 id
df = df.explode('coordinates')
print(df[['id', 'coordinates']])
结果
id coordinates
0 8233037 [107.612018, -6.921755]
0 8233037 [107.611888, -6.92303]
0 8233037 [107.611715, -6.92473]
0 8233037 [107.611715, -6.92489]
0 8233037 [107.611729, -6.925015]
.. ... ...
1 8233313 [107.623747, -6.909832]
1 8233313 [107.625456, -6.910273]
1 8233313 [107.625764, -6.910353]
1 8233313 [107.625871, -6.910358]
1 8233313 [107.626035, -6.910264]
[204 rows x 2 columns]
我可以使用 Series
将 coordinates
转换成两行
df[ ['lat', 'long'] ] = df['coordinates'].apply(pd.Series)
print(df)
结果:
id lat long
0 8233037 107.612018 -6.921755
0 8233037 107.611888 -6.923030
0 8233037 107.611715 -6.924730
0 8233037 107.611715 -6.924890
0 8233037 107.611729 -6.925015
.. ... ... ...
1 8233313 107.623747 -6.909832
1 8233313 107.625456 -6.910273
1 8233313 107.625764 -6.910353
1 8233313 107.625871 -6.910358
1 8233313 107.626035 -6.910264
[204 rows x 3 columns]
完整的工作代码
import pandas as pd
import json
import dirtyjson
df = pd.DataFrame({
'id': [8233037, 8233313],
'geometry': ["{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}","{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}"]
})
#df['data'] = df['geometry'].apply(lambda row:dirtyjson.loads(row))
df['data'] = df['geometry'].apply(lambda row:json.loads(row.replace("'", '"')))
print(df['data'])
df['coordinates'] = df['data'].apply(lambda row:row['coordinates'])
print(df['coordinates'])
def flatten(row):
result = []
for item in row:
result += item
return result
#df['coordinates'] = df['coordinates'].apply(flatten)
df['coordinates'] = df['coordinates'].apply(lambda row: sum(row, []))
print(df['coordinates'])
df = df.explode('coordinates')
print(df[['id', 'coordinates']])
df[ ['lat', 'long'] ] = df['coordinates'].apply(pd.Series)
print(df[['id', 'lat', 'long']])
所以,我有一个数据框如下:
import pandas as pd
df = pd.DataFrame(
{
"id": [8233037, 8233313],
"geometry": [
"{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}",
"{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}",
],
}
)
df
我想在 og 数据帧中分别包含一些单行字符串的结果,例如 [107.625764, -6.910353]、[107.625871, -6.910358],拆分为 107.625764, -6.910353 。预期结果的详细信息如下图所示。 Expected Results
据我所知,我们可以应用 str.split 方法并指定任何特定的分隔符。方法如下:
df[
["coordinate1", "coordinate2", "coordinate3", "coordinate4", "coordinate-n"]
] = df.geometry.str.split(
" ",
expand=True,
)
问题:我不知道应该在 str.split(" ").
之后放置正确的分隔符如何操作 DataFrame 列中的值,直到得到预期的 table,如下图所示? Expected Results
我有一个纯 python 的解决方案。
首先geometry
是一个类似JSON的字符串,但是它有语法错误关键是单引号,JSON需要双引号,所以我用yaml
解析它。
然后我只需要格式化它来列出数据
import yaml
df = pd.DataFrame({ 'id':[8233037,8233313],
'geometry': ["{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}","{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}"]})
data = []
for _,row in df.iterrows():
id = row['id']
geo = yaml.load(row['geometry'])['coordinates']
geos = []
for g in geo:
geos += g
data += [[id,g[0],g[1]] for g in geos]
df_new = pd.DataFrame(data,columns=['id','latitude','longtitude'])
df_new
id latitude longtitude
0 8233037 107.612018 -6.921755
1 8233037 107.611888 -6.923030
2 8233037 107.611715 -6.924730
3 8233037 107.611715 -6.924890
4 8233037 107.611729 -6.925015
... ... ... ...
199 8233313 107.623747 -6.909832
200 8233313 107.625456 -6.910273
201 8233313 107.625764 -6.910353
202 8233313 107.625871 -6.910358
203 8233313 107.626035 -6.910264
204 rows × 3 columns
geometry
看起来像 JSON
字符串,所以首先我会使用模块 json
将它从字符串转换为普通 list/dict。晚些时候你可以更简单地访问值。
但它不正确 JSON
所以我可以为此使用模块 dirtyjson
df['data'] = df['geometry'].apply(lambda row:dirtyjson.loads(row))
print(df['data'])
或者(幸运的是)我可以用 "
替换 '
以获得正确的 JSON
df['data'] = df['geometry'].apply(lambda row:json.loads(row.replace("'", '"')))
结果
0 {'type': 'MultiLineString', 'coordinates': [[[...
1 {'type': 'MultiLineString', 'coordinates': [[[...
Name: data, dtype: object
接下来我只得到coordinates
df['coordinates'] = df['data'].apply(lambda row:row['coordinates'])
print(df['coordinates'])
结果
0 [[[107.612018, -6.921755], [107.611888, -6.923...
1 [[[107.614077, -6.91033], [107.614837, -6.9100...
Name: coordinates, dtype: object
它是嵌套列表,所以我把它压扁了
def flatten(row):
result = []
for item in row:
result += item
return result
df['coordinates'] = df['coordinates'].apply(flatten)
print(df['coordinates'])
或者我可以使用 sum()
和 []
作为起始值
df['coordinates'] = df['coordinates'].apply(lambda row: sum(row, []))
结果
0 [[107.612018, -6.921755], [107.611888, -6.9230...
1 [[107.614077, -6.91033], [107.614837, -6.91005...
Name: coordinates, dtype: object
现在我可以 explode
将每一对放在单独的行中 id
df = df.explode('coordinates')
print(df[['id', 'coordinates']])
结果
id coordinates
0 8233037 [107.612018, -6.921755]
0 8233037 [107.611888, -6.92303]
0 8233037 [107.611715, -6.92473]
0 8233037 [107.611715, -6.92489]
0 8233037 [107.611729, -6.925015]
.. ... ...
1 8233313 [107.623747, -6.909832]
1 8233313 [107.625456, -6.910273]
1 8233313 [107.625764, -6.910353]
1 8233313 [107.625871, -6.910358]
1 8233313 [107.626035, -6.910264]
[204 rows x 2 columns]
我可以使用 Series
将 coordinates
转换成两行
df[ ['lat', 'long'] ] = df['coordinates'].apply(pd.Series)
print(df)
结果:
id lat long
0 8233037 107.612018 -6.921755
0 8233037 107.611888 -6.923030
0 8233037 107.611715 -6.924730
0 8233037 107.611715 -6.924890
0 8233037 107.611729 -6.925015
.. ... ... ...
1 8233313 107.623747 -6.909832
1 8233313 107.625456 -6.910273
1 8233313 107.625764 -6.910353
1 8233313 107.625871 -6.910358
1 8233313 107.626035 -6.910264
[204 rows x 3 columns]
完整的工作代码
import pandas as pd
import json
import dirtyjson
df = pd.DataFrame({
'id': [8233037, 8233313],
'geometry': ["{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}","{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}"]
})
#df['data'] = df['geometry'].apply(lambda row:dirtyjson.loads(row))
df['data'] = df['geometry'].apply(lambda row:json.loads(row.replace("'", '"')))
print(df['data'])
df['coordinates'] = df['data'].apply(lambda row:row['coordinates'])
print(df['coordinates'])
def flatten(row):
result = []
for item in row:
result += item
return result
#df['coordinates'] = df['coordinates'].apply(flatten)
df['coordinates'] = df['coordinates'].apply(lambda row: sum(row, []))
print(df['coordinates'])
df = df.explode('coordinates')
print(df[['id', 'coordinates']])
df[ ['lat', 'long'] ] = df['coordinates'].apply(pd.Series)
print(df[['id', 'lat', 'long']])