Pandas 通过返回无法访问的系列数据对组进行排序
Pandas sorted groupby returning series data which is not able to access
示例数据:
user_id content_id date
0 user_44289 cont_3375_16_10 2020-03-06
1 user_44289 cont_1195_1_8 2019-04-18
2 user_44289 cont_3470_2_15 2021-09-18
3 user_44289 cont_310_25_9 2020-09-08
4 user_44289 cont_4350_1_3 2021-06-25
5 user_40584 cont_1399_27_6 2018-11-14
6 user_40584 cont_1808_2_4 2021-05-07
7 user_40584 cont_2615_7_24 2021-10-14
使用下面的 pandas 查询我正在分组和排序,它返回 all_users_list 类型 pandas.core.series.Series
all_users_list = final_data.sort_values(by=['user_id','date','content_id'], ascending=False).groupby(['user_id','date','content_id'], sort=False)['user_id','content_id','date'].apply(list)
输出:
user_id date content_id
user_99974 2021-10-09 cont_4104_7_52 [user_id, content_id, date]
2021-10-04 cont_2253_6_4 [user_id, content_id, date]
2021-08-30 cont_2311_4_4 [user_id, content_id, date]
2021-07-22 cont_676_5_31 [user_id, content_id, date]
2021-05-28 cont_2456_6_1 [user_id, content_id, date]
...
user_10013 2018-12-04 cont_2597_6_8 [user_id, content_id, date]
2018-09-11 cont_2233_3_8 [user_id, content_id, date]
2018-08-13 cont_300_1_1 [user_id, content_id, date]
2018-04-10 cont_2244_16_1 [user_id, content_id, date]
2018-02-03 cont_3189_6_12 [user_id, content_id, date]
但我需要访问 user_id、content_id 的 3 列数据和来自此 all_users_list 的日期。
result = all_users_list.values.tolist()
result[0:10]
它总是返回下面的数据,但我需要访问上面显示的实际数据,其中包含分组的“user_id”、“日期”和“content_id”
[['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date']]
请帮忙解决这个问题。谢谢
更新:
def getContent(user):
indices = np.where(result == 'user_10013')
return result[indices][1] ## this should return the list of content_id for the retrieved user_id 'user_10013'
但是打印结果总是显示['user_id', 'content_id', 'date']
你想要这样的东西吗:
out = df.sort_values('date', ascending=False).groupby('user_id').agg(list)
print(out)
# Output
content_id date
user_id
user_40584 [cont_2615_7_24, cont_1808_2_4, cont_1399_27_6] [2021-10-14, 2021-05-07, 2018-11-14]
user_44289 [cont_3470_2_15, cont_4350_1_3, cont_310_25_9,... [2021-09-18, 2021-06-25, 2020-09-08, 2020-03-0...
示例数据:
user_id content_id date
0 user_44289 cont_3375_16_10 2020-03-06
1 user_44289 cont_1195_1_8 2019-04-18
2 user_44289 cont_3470_2_15 2021-09-18
3 user_44289 cont_310_25_9 2020-09-08
4 user_44289 cont_4350_1_3 2021-06-25
5 user_40584 cont_1399_27_6 2018-11-14
6 user_40584 cont_1808_2_4 2021-05-07
7 user_40584 cont_2615_7_24 2021-10-14
使用下面的 pandas 查询我正在分组和排序,它返回 all_users_list 类型 pandas.core.series.Series
all_users_list = final_data.sort_values(by=['user_id','date','content_id'], ascending=False).groupby(['user_id','date','content_id'], sort=False)['user_id','content_id','date'].apply(list)
输出:
user_id date content_id
user_99974 2021-10-09 cont_4104_7_52 [user_id, content_id, date]
2021-10-04 cont_2253_6_4 [user_id, content_id, date]
2021-08-30 cont_2311_4_4 [user_id, content_id, date]
2021-07-22 cont_676_5_31 [user_id, content_id, date]
2021-05-28 cont_2456_6_1 [user_id, content_id, date]
...
user_10013 2018-12-04 cont_2597_6_8 [user_id, content_id, date]
2018-09-11 cont_2233_3_8 [user_id, content_id, date]
2018-08-13 cont_300_1_1 [user_id, content_id, date]
2018-04-10 cont_2244_16_1 [user_id, content_id, date]
2018-02-03 cont_3189_6_12 [user_id, content_id, date]
但我需要访问 user_id、content_id 的 3 列数据和来自此 all_users_list 的日期。
result = all_users_list.values.tolist()
result[0:10]
它总是返回下面的数据,但我需要访问上面显示的实际数据,其中包含分组的“user_id”、“日期”和“content_id”
[['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date'],
['user_id', 'content_id', 'date']]
请帮忙解决这个问题。谢谢
更新:
def getContent(user):
indices = np.where(result == 'user_10013')
return result[indices][1] ## this should return the list of content_id for the retrieved user_id 'user_10013'
但是打印结果总是显示['user_id', 'content_id', 'date']
你想要这样的东西吗:
out = df.sort_values('date', ascending=False).groupby('user_id').agg(list)
print(out)
# Output
content_id date
user_id
user_40584 [cont_2615_7_24, cont_1808_2_4, cont_1399_27_6] [2021-10-14, 2021-05-07, 2018-11-14]
user_44289 [cont_3470_2_15, cont_4350_1_3, cont_310_25_9,... [2021-09-18, 2021-06-25, 2020-09-08, 2020-03-0...