pandas 按值分组并创建新数据框?
pandas group by value & create new data frames?
我有以下 sample
数据框:
id shcool_id time_created
710 1045152 2019-07-26 15:10:26
5141 6853654 2020-10-07 11:32:30
2278 3460257 2019-11-01 17:31:11
3877 2186089 2020-02-14 14:53:43
3877 1841367 2020-02-14 14:53:43
2019 3266938 2019-11-01 12:40:35
4910 1608407 2020-09-21 15:47:40
3926 4480633 2020-02-14 16:07:04
3447 5416477 2020-01-17 13:13:36
我想按 id
对这个数据框进行分组,这样我就有几个数据框,例如:
df1=id shcool_id time_created
710 1045152 2019-07-26 15:10:26
df2=id shcool_id time_created
5141 6853654 2020-10-07 11:32:30
df3=id shcool_id time_created
2278 3460257 2019-11-01 17:31:11
df4=id shcool_id time_created
3877 2186089 2020-02-14 14:53:43
3877 1841367 2020-02-14 14:53:43
df5=id shcool_id time_created
2019 3266938 2019-11-01 12:40:35
df6=id shcool_id time_created
4910 1608407 2020-09-21 15:47:40
df7=id shcool_id time_created
3926 4480633 2020-02-14 16:07:04
df8=id shcool_id time_created
3447 5416477 2020-01-17 13:13:36
df9=id shcool_id time_created
1935 2788320 2019-10-31 14:10:46
我不知道有多少个唯一 ID,所以我想知道是否有办法解决这个问题。
抱歉,如果之前有人问过这个问题。我确实搜索了,但可能我没有搜索正确的短语¯_(ツ)_/¯
提前致谢!
如果您希望数据帧在全球范围内可用,则必须分配给 globals()
:
>>> for i, (_, v) in enumerate(df.groupby('id'), start=1):
... globals()[f'df{i}'] = v
# Now all the new dfs will be available globally
>>> df1
id shcool_id time_created
0 710 1045152 2019-07-26 15:10:26
但最好创建一个 dict
:
>>> database = {f'df{i}': v for i, (_, v) in enumerate(df.groupby('id'), start=1)}
>>> database['df1']
id shcool_id time_created
0 710 1045152 2019-07-26 15:10:26
如果您希望能够通过索引组访问 df
s:
>>> database = dict(list(df.groupby('id')))
>>> database[710]
id shcool_id time_created
0 710 1045152 2019-07-26 15:10:26
这里 df 是您的原始数据框。 df_list 将包含所有数据帧的列表根据 id
拆分
df_list = []
uniq_ids = df.id.unique()
for id in uniq_ids:
new_df = df[df.id == id]
df_list.append(new_df)
示例输出
df_list[2]
id shcool_id time_created
2 2278 3460257 2019-11-01 17:31:11
df_list[3]
id shcool_id time_created
3 3877 2186089 2020-02-14 14:53:43
4 3877 1841367 2020-02-14 14:53:43
我有以下 sample
数据框:
id shcool_id time_created
710 1045152 2019-07-26 15:10:26
5141 6853654 2020-10-07 11:32:30
2278 3460257 2019-11-01 17:31:11
3877 2186089 2020-02-14 14:53:43
3877 1841367 2020-02-14 14:53:43
2019 3266938 2019-11-01 12:40:35
4910 1608407 2020-09-21 15:47:40
3926 4480633 2020-02-14 16:07:04
3447 5416477 2020-01-17 13:13:36
我想按 id
对这个数据框进行分组,这样我就有几个数据框,例如:
df1=id shcool_id time_created
710 1045152 2019-07-26 15:10:26
df2=id shcool_id time_created
5141 6853654 2020-10-07 11:32:30
df3=id shcool_id time_created
2278 3460257 2019-11-01 17:31:11
df4=id shcool_id time_created
3877 2186089 2020-02-14 14:53:43
3877 1841367 2020-02-14 14:53:43
df5=id shcool_id time_created
2019 3266938 2019-11-01 12:40:35
df6=id shcool_id time_created
4910 1608407 2020-09-21 15:47:40
df7=id shcool_id time_created
3926 4480633 2020-02-14 16:07:04
df8=id shcool_id time_created
3447 5416477 2020-01-17 13:13:36
df9=id shcool_id time_created
1935 2788320 2019-10-31 14:10:46
我不知道有多少个唯一 ID,所以我想知道是否有办法解决这个问题。
抱歉,如果之前有人问过这个问题。我确实搜索了,但可能我没有搜索正确的短语¯_(ツ)_/¯
提前致谢!
如果您希望数据帧在全球范围内可用,则必须分配给 globals()
:
>>> for i, (_, v) in enumerate(df.groupby('id'), start=1):
... globals()[f'df{i}'] = v
# Now all the new dfs will be available globally
>>> df1
id shcool_id time_created
0 710 1045152 2019-07-26 15:10:26
但最好创建一个 dict
:
>>> database = {f'df{i}': v for i, (_, v) in enumerate(df.groupby('id'), start=1)}
>>> database['df1']
id shcool_id time_created
0 710 1045152 2019-07-26 15:10:26
如果您希望能够通过索引组访问 df
s:
>>> database = dict(list(df.groupby('id')))
>>> database[710]
id shcool_id time_created
0 710 1045152 2019-07-26 15:10:26
这里 df 是您的原始数据框。 df_list 将包含所有数据帧的列表根据 id
拆分df_list = []
uniq_ids = df.id.unique()
for id in uniq_ids:
new_df = df[df.id == id]
df_list.append(new_df)
示例输出
df_list[2]
id shcool_id time_created
2 2278 3460257 2019-11-01 17:31:11
df_list[3]
id shcool_id time_created
3 3877 2186089 2020-02-14 14:53:43
4 3877 1841367 2020-02-14 14:53:43