如何将列中的值组合转换为单独的列?

How to transform combinations of values in columns into individual columns?

我有一个数据集 (df),它看起来像这样:

Date ID County Name State State Name Product Name Type of Transaction QTY
202105 10001 Los Angeles CA California Shoes Entry 630
202012 10002 Houston TX Texas Keyboard Exit 5493
202001 11684 Chicago IL Illionis Phone Disposal 220
202107 12005 New York NY New York Phone Entry 302
... ... ... ... ... ... ... ...
202111 14990 Orlando FL Florida Shoes Exit 201

对于每个县,不同产品、交易类型和不同日期都有多个条目,但并非所有县都有相同数量的条目,并且它们不遵循相同的日期。

我想重新创建这个数据集,这样: 1 - 所有县都有相同的开始和结束日期,对于县没有记录条目的那些日期,我希望将此条目记录为 NaN。 2 - 产品名称及其类型是它们自己的列。

本质上,这就是数据集需要的样子:

Date ID County Name State State Name Shoes, Entry Shoes, Exit Shoes, Disposal Phones, Entry Phones, Exit Phones, Disposal Keyboard, Entry Keyboard, Exit Keyboard, Disposal
202105 10001 Los Angeles CA California 594 694 5660 33299 1110 5659 4559 3223 56889
202012 10002 Houston TX Texas 3420 4439 549 2110 5669 2245 39294 3345 556
202001 11684 Chicago IL Illionis 55432 4439 329 21190 4320 455 34059 44556 5677
202107 12005 New York NY New York 34556 2204 4329 11193 22345 43221 1544 3467 22450
... ... ... ... ... ... ... ... ... ... ... ... ... ...
202111 14990 Orlando FL Florida 54543 23059 3290 21394 34335 59660 NaN NaN NaN

根据示例,您可以看到佛罗里达州如何不记录某些交易。我想添加 NaN 使数据框看起来像这样。感谢所有帮助!

这本质上是一个 pivot,具有 MultiIndex 的扁平化:

(df
 .pivot(index=['Date', 'ID', 'County Name', 'State', 'State Name'],
        columns=['Product Name', 'Type of Transaction'],
        values='QTY')
 .pipe(lambda d: d.set_axis(map(','.join, d. columns), axis=1))
 .reset_index()
 )

输出:

     Date     ID  County Name State  State Name  Shoes,Entry  Keyboard,Exit  \
0  202001  11684      Chicago    IL    Illionis          NaN            NaN   
1  202012  10002      Houston    TX       Texas          NaN         5493.0   
2  202105  10001  Los Angeles    CA  California        630.0            NaN   
3  202107  12005     New York    NY    New York          NaN            NaN   

   Phone,Disposal  Phone,Entry  
0           220.0          NaN  
1             NaN          NaN  
2             NaN          NaN  
3             NaN        302.0