如何将列表列表转换为 pandas 数据框的列?
How to turn a list of lists into columns of a pandas dataframe?
我想问一下如何取消嵌套列表并将其转换为数据框的不同列。具体来说,我有以下数据框,其中 Route_set column
是列表列表:
Generation Route_set
0 0 [[20. 19. 47. 56.] [21. 34. 78. 34.]]
所需的输出是以下数据帧:
route1 route2
0 20 21
1 19 34
2 47 78
3 56 34
有什么办法吗?提前致谢!
您可以创建字典并使用 for 循环更新它,这不是最快的方法,但非常简单。
new_dic = {}
# Create and fill dictionnary, each key_value pair corresponds to a list
for i, values in enumerate(df.Route_set):
new_dic[f'route{i}'] = values
# Drop the double list column
df.drop('Route_set', axis=1, inplace=True)
# Updated dataframe with dic key_value pairs
for key in new_dic.keys():
df[key] = new_dic[key]
您可能会做得更好,但这应该可以快速解决您的问题!
您可以尝试使用 df.explode
和 df.apply
:
import pandas as pd
df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df['route1']=df['Route_set'].apply(lambda x: x[0])
df['route2']=df['Route_set'].apply(lambda x: x[1])
df = df.explode(['route1', 'route2'], ignore_index=True)
df2 = df[df.columns.difference(['Route_set', 'Generation'])]
| | route1 | route2 |
|---:|---------:|---------:|
| 0 | 20 | 21 |
| 1 | 19 | 34 |
| 2 | 47 | 78 |
| 3 | 56 | 34 |
或者您可以使用如下值创建一个新数据框:
import pandas as pd
df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df1 = pd.DataFrame.from_dict(dict(zip(['route1', 'route2'], df.Route_set.to_numpy()[0])), orient='index').transpose()
| | route1 | route2 |
|---:|---------:|---------:|
| 0 | 20 | 21 |
| 1 | 19 | 34 |
| 2 | 47 | 78 |
| 3 | 56 | 34 |
更新 1:
import pandas as pd
df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[
[[20.0, 19.0, 47.0, 56.0, 43.0, 53.0, 18.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 51.0, 46.0, 37.0, 2.0, 57.0, 49.0, 36.0, 25.0, 5.0, 4.0, 34.0], [54.0, 23.0, 5.0, 46.0, 34.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 48.0, 46.0, 35.0, 25.0, 27.0, 52.0, 8.0, 39.0, 22.0, 51.0, 28.0], [57.0, 16.0, 45.0, 25.0, 49.0, 38.0, 0.0, 46.0, 13.0, 18.0, 19.0, 20.0], [21.0, 11.0, 6.0, 33.0, 25.0, 49.0, 57.0, 29.0, 12.0, 3.0, -1.0, -1.0], [9.0, 15.0, 47.0, 42.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [51.0, 25.0, 22.0, 14.0, 39.0, 8.0, 40.0, 0.0, 10.0, 26.0, 32.0, 47.0], [1.0, 33.0, 24.0, 46.0, 56.0, 30.0, 48.0, 51.0, -1.0, -1.0, -1.0, -1.0], [25.0, 31.0, 50.0, 17.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 12.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 41.0, 47.0, 15.0, 46.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [14.0, 44.0, 39.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 51.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 49.0, 5.0, 20.0, 37.0, 46.0, 36.0, 25.0, 39.0, 51.0, 48.0, -1.0], [5.0, 0.0, 33.0, 55.0, 25.0, 48.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [51.0, 32.0, 33.0, 24.0, 35.0, 8.0, 25.0, 4.0, 46.0, 1.0, 7.0, -1.0], [5.0, 25.0, 34.0, 46.0, 1.0, 9.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [38.0, 57.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [12.0, 57.0, 49.0, 25.0, 9.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0]],
]})
data = df.Route_set.to_numpy()[0]
df = pd.DataFrame.from_dict(dict(zip(['route{}'.format(i) for i in range(1, len(data)+1)], [data[i] for i in range(len(data))])), orient='index').transpose()
df = df.apply(lambda x: x.explode() if 'route' in x.name else x)
df[sorted(df.columns)]
print(df.to_markdown())
| | route1 | route2 | route3 | route4 | route5 | route6 | route7 | route8 | route9 | route10 | route11 | route12 | route13 | route14 | route15 | route16 | route17 | route18 | route19 | route20 |
|---:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|
| 0 | 20 | 20 | 54 | 57 | 57 | 21 | 9 | 51 | 1 | 25 | 57 | 20 | 14 | 20 | 57 | 5 | 51 | 5 | 38 | 12 |
| 1 | 19 | 51 | 23 | 48 | 16 | 11 | 15 | 25 | 33 | 31 | 12 | 41 | 44 | 51 | 49 | 0 | 32 | 25 | 57 | 57 |
| 2 | 47 | 46 | 5 | 46 | 45 | 6 | 47 | 22 | 24 | 50 | -1 | 47 | 39 | 25 | 5 | 33 | 33 | 34 | -1 | 49 |
| 3 | 56 | 37 | 46 | 35 | 25 | 33 | 42 | 14 | 46 | 17 | -1 | 15 | 25 | -1 | 20 | 55 | 24 | 46 | -1 | 25 |
| 4 | 43 | 2 | 34 | 25 | 49 | 25 | 25 | 39 | 56 | -1 | -1 | 46 | -1 | -1 | 37 | 25 | 35 | 1 | -1 | 9 |
| 5 | 53 | 57 | -1 | 27 | 38 | 49 | -1 | 8 | 30 | -1 | -1 | -1 | -1 | -1 | 46 | 48 | 8 | 9 | -1 | -1 |
| 6 | 18 | 49 | -1 | 52 | 0 | 57 | -1 | 40 | 48 | -1 | -1 | -1 | -1 | -1 | 36 | -1 | 25 | -1 | -1 | -1 |
| 7 | -1 | 36 | -1 | 8 | 46 | 29 | -1 | 0 | 51 | -1 | -1 | -1 | -1 | -1 | 25 | -1 | 4 | -1 | -1 | -1 |
| 8 | -1 | 25 | -1 | 39 | 13 | 12 | -1 | 10 | -1 | -1 | -1 | -1 | -1 | -1 | 39 | -1 | 46 | -1 | -1 | -1 |
| 9 | -1 | 5 | -1 | 22 | 18 | 3 | -1 | 26 | -1 | -1 | -1 | -1 | -1 | -1 | 51 | -1 | 1 | -1 | -1 | -1 |
| 10 | -1 | 4 | -1 | 51 | 19 | -1 | -1 | 32 | -1 | -1 | -1 | -1 | -1 | -1 | 48 | -1 | 7 | -1 | -1 | -1 |
| 11 | -1 | 34 | -1 | 28 | 20 | -1 | -1 | 47 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
我创建了一个创建 NumPy 的解决方案 array()
,transposes it and converts it back to a list of lists using tolist()
:
import numpy as np
import pandas as pd
routes = {
"Generation": 0,
"Route_set": [[[20, 19, 47, 56], [21, 34, 78, 34]]]
}
array = np.array(routes["Route_set"][0]).T.tolist()
columns_name = [f"routes{i}" for i in range(1, len(array[0])+1)]
df = pd.DataFrame(data=array, columns=columns_name)
print(df)
输出:
route1 route2
0 20 21
1 19 34
2 47 78
3 56 34
我想问一下如何取消嵌套列表并将其转换为数据框的不同列。具体来说,我有以下数据框,其中 Route_set column
是列表列表:
Generation Route_set
0 0 [[20. 19. 47. 56.] [21. 34. 78. 34.]]
所需的输出是以下数据帧:
route1 route2
0 20 21
1 19 34
2 47 78
3 56 34
有什么办法吗?提前致谢!
您可以创建字典并使用 for 循环更新它,这不是最快的方法,但非常简单。
new_dic = {}
# Create and fill dictionnary, each key_value pair corresponds to a list
for i, values in enumerate(df.Route_set):
new_dic[f'route{i}'] = values
# Drop the double list column
df.drop('Route_set', axis=1, inplace=True)
# Updated dataframe with dic key_value pairs
for key in new_dic.keys():
df[key] = new_dic[key]
您可能会做得更好,但这应该可以快速解决您的问题!
您可以尝试使用 df.explode
和 df.apply
:
import pandas as pd
df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df['route1']=df['Route_set'].apply(lambda x: x[0])
df['route2']=df['Route_set'].apply(lambda x: x[1])
df = df.explode(['route1', 'route2'], ignore_index=True)
df2 = df[df.columns.difference(['Route_set', 'Generation'])]
| | route1 | route2 |
|---:|---------:|---------:|
| 0 | 20 | 21 |
| 1 | 19 | 34 |
| 2 | 47 | 78 |
| 3 | 56 | 34 |
或者您可以使用如下值创建一个新数据框:
import pandas as pd
df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df1 = pd.DataFrame.from_dict(dict(zip(['route1', 'route2'], df.Route_set.to_numpy()[0])), orient='index').transpose()
| | route1 | route2 |
|---:|---------:|---------:|
| 0 | 20 | 21 |
| 1 | 19 | 34 |
| 2 | 47 | 78 |
| 3 | 56 | 34 |
更新 1:
import pandas as pd
df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[
[[20.0, 19.0, 47.0, 56.0, 43.0, 53.0, 18.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 51.0, 46.0, 37.0, 2.0, 57.0, 49.0, 36.0, 25.0, 5.0, 4.0, 34.0], [54.0, 23.0, 5.0, 46.0, 34.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 48.0, 46.0, 35.0, 25.0, 27.0, 52.0, 8.0, 39.0, 22.0, 51.0, 28.0], [57.0, 16.0, 45.0, 25.0, 49.0, 38.0, 0.0, 46.0, 13.0, 18.0, 19.0, 20.0], [21.0, 11.0, 6.0, 33.0, 25.0, 49.0, 57.0, 29.0, 12.0, 3.0, -1.0, -1.0], [9.0, 15.0, 47.0, 42.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [51.0, 25.0, 22.0, 14.0, 39.0, 8.0, 40.0, 0.0, 10.0, 26.0, 32.0, 47.0], [1.0, 33.0, 24.0, 46.0, 56.0, 30.0, 48.0, 51.0, -1.0, -1.0, -1.0, -1.0], [25.0, 31.0, 50.0, 17.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 12.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 41.0, 47.0, 15.0, 46.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [14.0, 44.0, 39.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 51.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 49.0, 5.0, 20.0, 37.0, 46.0, 36.0, 25.0, 39.0, 51.0, 48.0, -1.0], [5.0, 0.0, 33.0, 55.0, 25.0, 48.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [51.0, 32.0, 33.0, 24.0, 35.0, 8.0, 25.0, 4.0, 46.0, 1.0, 7.0, -1.0], [5.0, 25.0, 34.0, 46.0, 1.0, 9.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [38.0, 57.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [12.0, 57.0, 49.0, 25.0, 9.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0]],
]})
data = df.Route_set.to_numpy()[0]
df = pd.DataFrame.from_dict(dict(zip(['route{}'.format(i) for i in range(1, len(data)+1)], [data[i] for i in range(len(data))])), orient='index').transpose()
df = df.apply(lambda x: x.explode() if 'route' in x.name else x)
df[sorted(df.columns)]
print(df.to_markdown())
| | route1 | route2 | route3 | route4 | route5 | route6 | route7 | route8 | route9 | route10 | route11 | route12 | route13 | route14 | route15 | route16 | route17 | route18 | route19 | route20 |
|---:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|
| 0 | 20 | 20 | 54 | 57 | 57 | 21 | 9 | 51 | 1 | 25 | 57 | 20 | 14 | 20 | 57 | 5 | 51 | 5 | 38 | 12 |
| 1 | 19 | 51 | 23 | 48 | 16 | 11 | 15 | 25 | 33 | 31 | 12 | 41 | 44 | 51 | 49 | 0 | 32 | 25 | 57 | 57 |
| 2 | 47 | 46 | 5 | 46 | 45 | 6 | 47 | 22 | 24 | 50 | -1 | 47 | 39 | 25 | 5 | 33 | 33 | 34 | -1 | 49 |
| 3 | 56 | 37 | 46 | 35 | 25 | 33 | 42 | 14 | 46 | 17 | -1 | 15 | 25 | -1 | 20 | 55 | 24 | 46 | -1 | 25 |
| 4 | 43 | 2 | 34 | 25 | 49 | 25 | 25 | 39 | 56 | -1 | -1 | 46 | -1 | -1 | 37 | 25 | 35 | 1 | -1 | 9 |
| 5 | 53 | 57 | -1 | 27 | 38 | 49 | -1 | 8 | 30 | -1 | -1 | -1 | -1 | -1 | 46 | 48 | 8 | 9 | -1 | -1 |
| 6 | 18 | 49 | -1 | 52 | 0 | 57 | -1 | 40 | 48 | -1 | -1 | -1 | -1 | -1 | 36 | -1 | 25 | -1 | -1 | -1 |
| 7 | -1 | 36 | -1 | 8 | 46 | 29 | -1 | 0 | 51 | -1 | -1 | -1 | -1 | -1 | 25 | -1 | 4 | -1 | -1 | -1 |
| 8 | -1 | 25 | -1 | 39 | 13 | 12 | -1 | 10 | -1 | -1 | -1 | -1 | -1 | -1 | 39 | -1 | 46 | -1 | -1 | -1 |
| 9 | -1 | 5 | -1 | 22 | 18 | 3 | -1 | 26 | -1 | -1 | -1 | -1 | -1 | -1 | 51 | -1 | 1 | -1 | -1 | -1 |
| 10 | -1 | 4 | -1 | 51 | 19 | -1 | -1 | 32 | -1 | -1 | -1 | -1 | -1 | -1 | 48 | -1 | 7 | -1 | -1 | -1 |
| 11 | -1 | 34 | -1 | 28 | 20 | -1 | -1 | 47 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
我创建了一个创建 NumPy 的解决方案 array()
,transposes it and converts it back to a list of lists using tolist()
:
import numpy as np
import pandas as pd
routes = {
"Generation": 0,
"Route_set": [[[20, 19, 47, 56], [21, 34, 78, 34]]]
}
array = np.array(routes["Route_set"][0]).T.tolist()
columns_name = [f"routes{i}" for i in range(1, len(array[0])+1)]
df = pd.DataFrame(data=array, columns=columns_name)
print(df)
输出:
route1 route2
0 20 21
1 19 34
2 47 78
3 56 34