将列表列表(嵌套列表)转换为数据框
Converting List of List (nested list) into Dataframe
我有一个嵌套列表,其中每个 'nested list' 按 10 分钟间隔分组(使用 timedelta 函数)。
print(groups)
-----------------------results---------------------------
[[[1, Timestamp('2019-01-22 00:54:27')]],
[[2, Timestamp('2019-01-22 08:37:04')]],
[[3, Timestamp('2019-01-22 10:57:40')],
[4, Timestamp('2019-01-22 10:57:43')]],
[[5, Timestamp('2019-01-22 11:09:07')],
[6, Timestamp('2019-01-22 11:16:18')],
[7, Timestamp('2019-01-22 11:16:23')],
[8, Timestamp('2019-01-22 11:16:25')]],
[[9, Timestamp('2019-01-22 11:35:03')],
[10, Timestamp('2019-01-22 11:35:35')]]...
这些信息最初来自完整的数据框。我正在尝试将此嵌套列表转换为数据框格式,并带有一个名为“Group”的单独列。这意味着第一个 'nested list' 将是第 1 组,第二个 'nested list' 将是第 2 组。
正在考虑在转换为数据框之前先转换为字典。但是,它们也不起作用。
{k:row[0] for row in groups for k in row[1:]}
dict((k[0], k[1:]) for k in groups)
error: unhashable type: 'list'
总而言之,我希望编写一个函数来自动化我的流程,如下所示:
df_1 = pd.DataFrame(groups[1],columns=['index','datetime'])
df_1['group']=1
df_2 = pd.DataFrame(groups[2],columns=['index','datetime'])
df_2['group']=2
df_3 = pd.DataFrame(groups[3],columns=['index','datetime'])
df_3['group']=3
... etc...
很想听听你们是如何处理这个问题的!谢谢
这是您要实现的目标吗?
import pandas as pd
from pandas import Timestamp
l = [
[[1, Timestamp('2019-01-22 00:54:27')]],
[[2, Timestamp('2019-01-22 08:37:04')]],
[[3, Timestamp('2019-01-22 10:57:40')],
[4, Timestamp('2019-01-22 10:57:43')]],
[[5, Timestamp('2019-01-22 11:09:07')],
[6, Timestamp('2019-01-22 11:16:18')],
[7, Timestamp('2019-01-22 11:16:23')],
[8, Timestamp('2019-01-22 11:16:25')]],
[[9, Timestamp('2019-01-22 11:35:03')],
[10, Timestamp('2019-01-22 11:35:35')]],
]
df = pd.DataFrame(
[e1 + [g] for g, e0 in enumerate(l) for e1 in e0],
columns = ['id', 'time', 'group'],
)
print('As one DataFrame:\n', df)
dfs = [
pd.DataFrame(
[e1 + [g] for e1 in e0],
columns = ['id', 'time', 'group'],
)
for g, e0 in enumerate(l)
]
print('\nAs separate DataFrames:')
for df in dfs:
print('---------------------')
print(df)
产出
As one DataFrame:
id time group
0 1 2019-01-22 00:54:27 0
1 2 2019-01-22 08:37:04 1
2 3 2019-01-22 10:57:40 2
3 4 2019-01-22 10:57:43 2
4 5 2019-01-22 11:09:07 3
5 6 2019-01-22 11:16:18 3
6 7 2019-01-22 11:16:23 3
7 8 2019-01-22 11:16:25 3
8 9 2019-01-22 11:35:03 4
9 10 2019-01-22 11:35:35 4
As separate DataFrames:
---------------------
id time group
0 1 2019-01-22 00:54:27 0
---------------------
id time group
0 2 2019-01-22 08:37:04 1
---------------------
id time group
0 3 2019-01-22 10:57:40 2
1 4 2019-01-22 10:57:43 2
---------------------
id time group
0 5 2019-01-22 11:09:07 3
1 6 2019-01-22 11:16:18 3
2 7 2019-01-22 11:16:23 3
3 8 2019-01-22 11:16:25 3
---------------------
id time group
0 9 2019-01-22 11:35:03 4
1 10 2019-01-22 11:35:35 4
我有一个嵌套列表,其中每个 'nested list' 按 10 分钟间隔分组(使用 timedelta 函数)。
print(groups)
-----------------------results---------------------------
[[[1, Timestamp('2019-01-22 00:54:27')]],
[[2, Timestamp('2019-01-22 08:37:04')]],
[[3, Timestamp('2019-01-22 10:57:40')],
[4, Timestamp('2019-01-22 10:57:43')]],
[[5, Timestamp('2019-01-22 11:09:07')],
[6, Timestamp('2019-01-22 11:16:18')],
[7, Timestamp('2019-01-22 11:16:23')],
[8, Timestamp('2019-01-22 11:16:25')]],
[[9, Timestamp('2019-01-22 11:35:03')],
[10, Timestamp('2019-01-22 11:35:35')]]...
这些信息最初来自完整的数据框。我正在尝试将此嵌套列表转换为数据框格式,并带有一个名为“Group”的单独列。这意味着第一个 'nested list' 将是第 1 组,第二个 'nested list' 将是第 2 组。
正在考虑在转换为数据框之前先转换为字典。但是,它们也不起作用。
{k:row[0] for row in groups for k in row[1:]}
dict((k[0], k[1:]) for k in groups)
error: unhashable type: 'list'
总而言之,我希望编写一个函数来自动化我的流程,如下所示:
df_1 = pd.DataFrame(groups[1],columns=['index','datetime'])
df_1['group']=1
df_2 = pd.DataFrame(groups[2],columns=['index','datetime'])
df_2['group']=2
df_3 = pd.DataFrame(groups[3],columns=['index','datetime'])
df_3['group']=3
... etc...
很想听听你们是如何处理这个问题的!谢谢
这是您要实现的目标吗?
import pandas as pd
from pandas import Timestamp
l = [
[[1, Timestamp('2019-01-22 00:54:27')]],
[[2, Timestamp('2019-01-22 08:37:04')]],
[[3, Timestamp('2019-01-22 10:57:40')],
[4, Timestamp('2019-01-22 10:57:43')]],
[[5, Timestamp('2019-01-22 11:09:07')],
[6, Timestamp('2019-01-22 11:16:18')],
[7, Timestamp('2019-01-22 11:16:23')],
[8, Timestamp('2019-01-22 11:16:25')]],
[[9, Timestamp('2019-01-22 11:35:03')],
[10, Timestamp('2019-01-22 11:35:35')]],
]
df = pd.DataFrame(
[e1 + [g] for g, e0 in enumerate(l) for e1 in e0],
columns = ['id', 'time', 'group'],
)
print('As one DataFrame:\n', df)
dfs = [
pd.DataFrame(
[e1 + [g] for e1 in e0],
columns = ['id', 'time', 'group'],
)
for g, e0 in enumerate(l)
]
print('\nAs separate DataFrames:')
for df in dfs:
print('---------------------')
print(df)
产出
As one DataFrame:
id time group
0 1 2019-01-22 00:54:27 0
1 2 2019-01-22 08:37:04 1
2 3 2019-01-22 10:57:40 2
3 4 2019-01-22 10:57:43 2
4 5 2019-01-22 11:09:07 3
5 6 2019-01-22 11:16:18 3
6 7 2019-01-22 11:16:23 3
7 8 2019-01-22 11:16:25 3
8 9 2019-01-22 11:35:03 4
9 10 2019-01-22 11:35:35 4
As separate DataFrames:
---------------------
id time group
0 1 2019-01-22 00:54:27 0
---------------------
id time group
0 2 2019-01-22 08:37:04 1
---------------------
id time group
0 3 2019-01-22 10:57:40 2
1 4 2019-01-22 10:57:43 2
---------------------
id time group
0 5 2019-01-22 11:09:07 3
1 6 2019-01-22 11:16:18 3
2 7 2019-01-22 11:16:23 3
3 8 2019-01-22 11:16:25 3
---------------------
id time group
0 9 2019-01-22 11:35:03 4
1 10 2019-01-22 11:35:35 4