Pandas :如何拆分列中的元组数据并创建多列
Pandas :How to split the tuple data in column and create multiple columns
我创建了一个列,其中包含国家/地区名称以及单个列中的纬度和经度值。现在我想要不同列中的纬度值和经度值。
用于创建列的代码。
df['Country_cord'] = df['Country'].apply(geolocator.geocode)
这就是输出的样子。
0 (España, (40.0028028, -4.003104))
1 (United Kingdom, دبي, الإمارات العربيّة المتّ...
2 (France métropolitaine, France, (46.603354, 1....
3 (United States of America, (39.7837304, -100.4...
4 (Italia, (42.6384261, 12.674297))
5 (Deutschland, Europe, (51.0834196, 10.4234469))
6 (Argentina, (-34.9964963, -64.9672817))
7 (Ireland, (52.865196, -7.9794599))
8 (België / Belgique / Belgien, (50.6407351, 4.6...
9 (מדינת ישראל, (30.8760272, 35.0015196))
10 (Schweiz/Suisse/Svizzera/Svizra, (46.7985624, ...
11 (Nederland, (52.2379891, 5.53460738161551))
12 (Brasil, (-10.3333333, -53.2))
13 (Portugal, (40.033265, -7.8896263))
14 (Australia, (-24.7761086, 134.755))
15 (Danmark, (55.670249, 10.3333283))
16 (Maroc ⵍⵎⵖⵔⵉⴱ المغرب, (31.1728192, -7.3366043))
17 (Ciudad de México, Cuauhtémoc, CDMX, 06060, Mé...
18 (Canada, (61.0666922, -107.9917071))
19 (Sverige, (59.6749712, 14.5208584))
我希望输出的格式是一列纬度和一列经度。
df[lat] df[lon]
40.0028028 46.603354
46.603354 1.8883335
我认为您可以对 select 第一个第二元组使用双 str[],然后对嵌套元组中的第二个第一个和第二个元素使用:
s = df['Country'].apply(geolocator.geocode).str[1]
df['lat'] = s.str[0]
df['lon'] = s.str[1]
或使用DataFrame
构造函数:
s = df['Country'].apply(geolocator.geocode).str[1]
df = df.join(pd.DataFrame(s.values.tolist(), columns=['lat', 'lon']))
样本:
print (df)
Country
0 (Canada, (61.0666922, -107.9917071))
1 (Sverige, (59.6749712, 14.5208584))
s = df['Country'].str[1]
df = df.join(pd.DataFrame(s.values.tolist(), columns=['lat', 'lon']))
print (df)
Country lat lon
0 (Canada, (61.0666922, -107.9917071)) 61.066692 -107.991707
1 (Sverige, (59.6749712, 14.5208584)) 59.674971 14.520858
在 numpy 数组上压缩生成器表达式对此很有效:
import pandas as pd
df = pd.DataFrame([[('Country1', (341.123, 4534.123))],
[('Country2', (341.123, 4534.123))],
[('Country3', (341.123, 4534.123))],
[('Country4', (341.123, 4534.123))]],
columns=['Series1'])
df['Lat'], df['Lon'] = list(zip(*((x[1][0], x[1][1]) for x in df['Series1'].values)))
我创建了一个列,其中包含国家/地区名称以及单个列中的纬度和经度值。现在我想要不同列中的纬度值和经度值。
用于创建列的代码。
df['Country_cord'] = df['Country'].apply(geolocator.geocode)
这就是输出的样子。
0 (España, (40.0028028, -4.003104))
1 (United Kingdom, دبي, الإمارات العربيّة المتّ...
2 (France métropolitaine, France, (46.603354, 1....
3 (United States of America, (39.7837304, -100.4...
4 (Italia, (42.6384261, 12.674297))
5 (Deutschland, Europe, (51.0834196, 10.4234469))
6 (Argentina, (-34.9964963, -64.9672817))
7 (Ireland, (52.865196, -7.9794599))
8 (België / Belgique / Belgien, (50.6407351, 4.6...
9 (מדינת ישראל, (30.8760272, 35.0015196))
10 (Schweiz/Suisse/Svizzera/Svizra, (46.7985624, ...
11 (Nederland, (52.2379891, 5.53460738161551))
12 (Brasil, (-10.3333333, -53.2))
13 (Portugal, (40.033265, -7.8896263))
14 (Australia, (-24.7761086, 134.755))
15 (Danmark, (55.670249, 10.3333283))
16 (Maroc ⵍⵎⵖⵔⵉⴱ المغرب, (31.1728192, -7.3366043))
17 (Ciudad de México, Cuauhtémoc, CDMX, 06060, Mé...
18 (Canada, (61.0666922, -107.9917071))
19 (Sverige, (59.6749712, 14.5208584))
我希望输出的格式是一列纬度和一列经度。
df[lat] df[lon]
40.0028028 46.603354
46.603354 1.8883335
我认为您可以对 select 第一个第二元组使用双 str[],然后对嵌套元组中的第二个第一个和第二个元素使用:
s = df['Country'].apply(geolocator.geocode).str[1]
df['lat'] = s.str[0]
df['lon'] = s.str[1]
或使用DataFrame
构造函数:
s = df['Country'].apply(geolocator.geocode).str[1]
df = df.join(pd.DataFrame(s.values.tolist(), columns=['lat', 'lon']))
样本:
print (df)
Country
0 (Canada, (61.0666922, -107.9917071))
1 (Sverige, (59.6749712, 14.5208584))
s = df['Country'].str[1]
df = df.join(pd.DataFrame(s.values.tolist(), columns=['lat', 'lon']))
print (df)
Country lat lon
0 (Canada, (61.0666922, -107.9917071)) 61.066692 -107.991707
1 (Sverige, (59.6749712, 14.5208584)) 59.674971 14.520858
在 numpy 数组上压缩生成器表达式对此很有效:
import pandas as pd
df = pd.DataFrame([[('Country1', (341.123, 4534.123))],
[('Country2', (341.123, 4534.123))],
[('Country3', (341.123, 4534.123))],
[('Country4', (341.123, 4534.123))]],
columns=['Series1'])
df['Lat'], df['Lon'] = list(zip(*((x[1][0], x[1][1]) for x in df['Series1'].values)))