将 pandas 数据框转换为系列

converting pandas dataframe into series

我有以下数据框。

   order_id   Clusters
0    519     Cluster 5
1    520     Cluster 1
2    521     Cluster 1
3    523     Cluster 5
4    524     Cluster 1
5    525     Cluster 4
6    526     Cluster 4
7    527     Cluster 1
8    528     Cluster 2
9    529     Cluster 5
10   530     Cluster 6
11   531     Cluster 3
12   532     Cluster 1
13   533     Cluster 4
14   534     Cluster 5
15   535     Cluster 5

我想从上面的数据框中提取以下系列。

Cluster 1   [520 ,521, 524, 527, 532]
Cluster 2   [528]
Cluster 3   [531]
Cluster 4   [525,526,533]
Cluster 5   [519,523,529,534,535]
Cluster 6   [530]

这是我在 python 中的方法。

clusters_order_id = []

df_clusters = df.groupby('Clusters')

for i in df_clusters['order_id']:
   clusters_order_id.append(i)

这给了我

clusters_order_id
Out[196]: 
0    (Cluster 1, [520, 521, 524, 527, 532])
1                        (Cluster 2, [528])
2                        (Cluster 3, [531])
3              (Cluster 4, [525, 526, 533])
4    (Cluster 5, [519, 523, 529, 534, 535])
5                        (Cluster 6, [530])

但我不知道如何将上面的表格分成一系列表格。这样 Cluster 1,Cluster 2 将成为我的索引,相应的订单 ID 将是一个数组。请帮忙。

您可以使用 groupby and tolist:

print df.groupby('Clusters')['order_id'].apply(lambda x: x.tolist())

Clusters
Cluster 1    [520, 521, 524, 527, 532]
Cluster 2                        [528]
Cluster 3                        [531]
Cluster 4              [525, 526, 533]
Cluster 5    [519, 523, 529, 534, 535]
Cluster 6                        [530]
Name: order_id, dtype: object

时间:

In [153]: %timeit df.groupby('Clusters')['order_id'].apply(lambda x: x.tolist())
1000 loops, best of 3: 751 µs per loop

In [154]: %timeit df.pivot_table(index='Clusters', aggfunc=pd.Series.tolist)
100 loops, best of 3: 3.55 ms per loop

pivot_table的另一个解决方案:

In [473]: df.pivot_table(index='Clusters', aggfunc=pd.Series.tolist)
Out[473]:
                            order_id
Clusters
Cluster 1  [520, 521, 524, 527, 532]
Cluster 2                      [528]
Cluster 3                      [531]
Cluster 4            [525, 526, 533]
Cluster 5  [519, 523, 529, 534, 535]
Cluster 6                      [530]