如何将列中的值组合转换为单独的列?
How to transform combinations of values in columns into individual columns?
我有一个数据集 (df),它看起来像这样:
Date
ID
County Name
State
State Name
Product Name
Type of Transaction
QTY
202105
10001
Los Angeles
CA
California
Shoes
Entry
630
202012
10002
Houston
TX
Texas
Keyboard
Exit
5493
202001
11684
Chicago
IL
Illionis
Phone
Disposal
220
202107
12005
New York
NY
New York
Phone
Entry
302
...
...
...
...
...
...
...
...
202111
14990
Orlando
FL
Florida
Shoes
Exit
201
对于每个县,不同产品、交易类型和不同日期都有多个条目,但并非所有县都有相同数量的条目,并且它们不遵循相同的日期。
我想重新创建这个数据集,这样:
1 - 所有县都有相同的开始和结束日期,对于县没有记录条目的那些日期,我希望将此条目记录为 NaN。
2 - 产品名称及其类型是它们自己的列。
本质上,这就是数据集需要的样子:
Date
ID
County Name
State
State Name
Shoes, Entry
Shoes, Exit
Shoes, Disposal
Phones, Entry
Phones, Exit
Phones, Disposal
Keyboard, Entry
Keyboard, Exit
Keyboard, Disposal
202105
10001
Los Angeles
CA
California
594
694
5660
33299
1110
5659
4559
3223
56889
202012
10002
Houston
TX
Texas
3420
4439
549
2110
5669
2245
39294
3345
556
202001
11684
Chicago
IL
Illionis
55432
4439
329
21190
4320
455
34059
44556
5677
202107
12005
New York
NY
New York
34556
2204
4329
11193
22345
43221
1544
3467
22450
...
...
...
...
...
...
...
...
...
...
...
...
...
...
202111
14990
Orlando
FL
Florida
54543
23059
3290
21394
34335
59660
NaN
NaN
NaN
根据示例,您可以看到佛罗里达州如何不记录某些交易。我想添加 NaN 使数据框看起来像这样。感谢所有帮助!
这本质上是一个 pivot
,具有 MultiIndex 的扁平化:
(df
.pivot(index=['Date', 'ID', 'County Name', 'State', 'State Name'],
columns=['Product Name', 'Type of Transaction'],
values='QTY')
.pipe(lambda d: d.set_axis(map(','.join, d. columns), axis=1))
.reset_index()
)
输出:
Date ID County Name State State Name Shoes,Entry Keyboard,Exit \
0 202001 11684 Chicago IL Illionis NaN NaN
1 202012 10002 Houston TX Texas NaN 5493.0
2 202105 10001 Los Angeles CA California 630.0 NaN
3 202107 12005 New York NY New York NaN NaN
Phone,Disposal Phone,Entry
0 220.0 NaN
1 NaN NaN
2 NaN NaN
3 NaN 302.0
我有一个数据集 (df),它看起来像这样:
Date | ID | County Name | State | State Name | Product Name | Type of Transaction | QTY |
---|---|---|---|---|---|---|---|
202105 | 10001 | Los Angeles | CA | California | Shoes | Entry | 630 |
202012 | 10002 | Houston | TX | Texas | Keyboard | Exit | 5493 |
202001 | 11684 | Chicago | IL | Illionis | Phone | Disposal | 220 |
202107 | 12005 | New York | NY | New York | Phone | Entry | 302 |
... | ... | ... | ... | ... | ... | ... | ... |
202111 | 14990 | Orlando | FL | Florida | Shoes | Exit | 201 |
对于每个县,不同产品、交易类型和不同日期都有多个条目,但并非所有县都有相同数量的条目,并且它们不遵循相同的日期。
我想重新创建这个数据集,这样: 1 - 所有县都有相同的开始和结束日期,对于县没有记录条目的那些日期,我希望将此条目记录为 NaN。 2 - 产品名称及其类型是它们自己的列。
本质上,这就是数据集需要的样子:
Date | ID | County Name | State | State Name | Shoes, Entry | Shoes, Exit | Shoes, Disposal | Phones, Entry | Phones, Exit | Phones, Disposal | Keyboard, Entry | Keyboard, Exit | Keyboard, Disposal |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
202105 | 10001 | Los Angeles | CA | California | 594 | 694 | 5660 | 33299 | 1110 | 5659 | 4559 | 3223 | 56889 |
202012 | 10002 | Houston | TX | Texas | 3420 | 4439 | 549 | 2110 | 5669 | 2245 | 39294 | 3345 | 556 |
202001 | 11684 | Chicago | IL | Illionis | 55432 | 4439 | 329 | 21190 | 4320 | 455 | 34059 | 44556 | 5677 |
202107 | 12005 | New York | NY | New York | 34556 | 2204 | 4329 | 11193 | 22345 | 43221 | 1544 | 3467 | 22450 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
202111 | 14990 | Orlando | FL | Florida | 54543 | 23059 | 3290 | 21394 | 34335 | 59660 | NaN | NaN | NaN |
根据示例,您可以看到佛罗里达州如何不记录某些交易。我想添加 NaN 使数据框看起来像这样。感谢所有帮助!
这本质上是一个 pivot
,具有 MultiIndex 的扁平化:
(df
.pivot(index=['Date', 'ID', 'County Name', 'State', 'State Name'],
columns=['Product Name', 'Type of Transaction'],
values='QTY')
.pipe(lambda d: d.set_axis(map(','.join, d. columns), axis=1))
.reset_index()
)
输出:
Date ID County Name State State Name Shoes,Entry Keyboard,Exit \
0 202001 11684 Chicago IL Illionis NaN NaN
1 202012 10002 Houston TX Texas NaN 5493.0
2 202105 10001 Los Angeles CA California 630.0 NaN
3 202107 12005 New York NY New York NaN NaN
Phone,Disposal Phone,Entry
0 220.0 NaN
1 NaN NaN
2 NaN NaN
3 NaN 302.0