Pandas 在行中查找匹配条目并将这些行中的列合并为一个
Pandas finding mating entries on rows and merging columns in those rows into one
我有一个这样的数据字段。
Index
Product
Purchase_Address
Order_Date
0
A
604 Cherry st, Dallas
2019-10-28
1
B
225 5th st, LA
2019-10-29
2
C
604 Cherry st, Dallas
2019-10-28
3
D
225 5th st, LA
2019-10-29
4
E
967 12th st, NY
2019-10-27
5
F
967 12th st, NY
2019-10-27
6
A
628 Jefferson St, NY
2019-10-20
7
B
628 Jefferson St, NY
2019-10-20
8
A
694 Meadow St, Atlanta
2019-10-25
9
B
694 Meadow St, Atlanta
2019-10-25
10
C
27 Wilson St, Austin
2019-10-26
11
D
27 Wilson St, Austin
2019-10-26
我需要创建一个新的数据字段,如果地址和订购日期相同(意味着它们是同时订购的),我会将产品合并到一个列中。
df 应如下所示:
Index
Product
Purchase_Address
0
A, C
604 Cherry st, Dallas
1
B, D
225 5th st, LA
2
E, F
967 12th st, NY
3
A, B
628 Jefferson St, NY
4
A, B
694 Meadow St, Atlanta
5
C, D
27 Wilson St, Austin
然后从那个 df,我计算组合发生的次数:
Index
Product_Combination
Nr_Of_Times
0
A, C
1
1
B, D
1
2
E, F
1
4
A, B
2
5
C, D
1
我怎样才能实现这样的目标?
谢谢!
将 Groupby.agg
与 Groupby.count
和 Series.to_frame
一起使用:
In [1783]: out = df.groupby(['Purchase_Address', 'Order_Date']).agg({'Product': ','.join}).groupby('Product')['Product'].count().to_frame('Nr_Of_Times').reset_index()
In [1784]: out
Out[1784]:
Product Nr_Of_Times
0 A,B 2
1 A,C 1
2 B,D 1
3 C,D 1
4 E,F 1
我有一个这样的数据字段。
Index | Product | Purchase_Address | Order_Date |
---|---|---|---|
0 | A | 604 Cherry st, Dallas | 2019-10-28 |
1 | B | 225 5th st, LA | 2019-10-29 |
2 | C | 604 Cherry st, Dallas | 2019-10-28 |
3 | D | 225 5th st, LA | 2019-10-29 |
4 | E | 967 12th st, NY | 2019-10-27 |
5 | F | 967 12th st, NY | 2019-10-27 |
6 | A | 628 Jefferson St, NY | 2019-10-20 |
7 | B | 628 Jefferson St, NY | 2019-10-20 |
8 | A | 694 Meadow St, Atlanta | 2019-10-25 |
9 | B | 694 Meadow St, Atlanta | 2019-10-25 |
10 | C | 27 Wilson St, Austin | 2019-10-26 |
11 | D | 27 Wilson St, Austin | 2019-10-26 |
我需要创建一个新的数据字段,如果地址和订购日期相同(意味着它们是同时订购的),我会将产品合并到一个列中。
df 应如下所示:
Index | Product | Purchase_Address |
---|---|---|
0 | A, C | 604 Cherry st, Dallas |
1 | B, D | 225 5th st, LA |
2 | E, F | 967 12th st, NY |
3 | A, B | 628 Jefferson St, NY |
4 | A, B | 694 Meadow St, Atlanta |
5 | C, D | 27 Wilson St, Austin |
然后从那个 df,我计算组合发生的次数:
Index | Product_Combination | Nr_Of_Times |
---|---|---|
0 | A, C | 1 |
1 | B, D | 1 |
2 | E, F | 1 |
4 | A, B | 2 |
5 | C, D | 1 |
我怎样才能实现这样的目标? 谢谢!
将 Groupby.agg
与 Groupby.count
和 Series.to_frame
一起使用:
In [1783]: out = df.groupby(['Purchase_Address', 'Order_Date']).agg({'Product': ','.join}).groupby('Product')['Product'].count().to_frame('Nr_Of_Times').reset_index()
In [1784]: out
Out[1784]:
Product Nr_Of_Times
0 A,B 2
1 A,C 1
2 B,D 1
3 C,D 1
4 E,F 1