如何根据另一列的连续值创建各种虚拟对象

how to create various dummies based on consecutive values from another column

我有以下面板数据集。 "winner" =1 如果在期间(日期)有人是赢家,如果输家则为零。

ID   date  winner 
A   2017Q4  NaN
A   2018Q4   1         
A   2019Q4   0   
A   2020Q4   0    
A   2021Q4   1   
B   2017Q4  NaN 
B   2018Q4   1   
B   2019Q4   1   
B   2020Q4   0    
B   2021Q4   0   
C   2017Q4  NaN
C   2018Q4   0      
C   2019Q4   0    
C   2020Q4   0  
C   2021Q4   0 
D   2017Q4  NaN   
D   2018Q4   0                 
D   2019Q4   1   
D   2020Q4   1 
D   2021Q4   1 

我想创建四个虚拟变量,WW =1 如果某人连续两个时期获胜。 LL=1 如果连续两个周期输。 WL 如果第 1 期赢,下一期输,LW 反之亦然。

更新

当我应用下面的答案时,我得到以下结果

ID   date  winner  WW  LL  WL  LW
A   2017Q4  NaN
A   2018Q4   1     0   0   0   0  
A   2019Q4   0     0   0   1   0    
A   2020Q4   0     0   1   0   0  
A   2021Q4   1     0   0   0   1
B   2017Q4  NaN
B   2018Q4   1     0   0   0   0   
B   2019Q4   1     1   0   0   0
B   2020Q4   0     0   0   1   0
B   2021Q4   0     0   1   0   0
C   2017Q4  NaN
C   2018Q4   0     0   0   0   0      
C   2019Q4   0     0   1   0   0
C   2020Q4   0     0   1   0   0
C   2021Q4   0     0   1   0   0
D   2017Q4  NaN
D   2018Q4   0     0   0   0   0                
D   2019Q4   1     0   0   0   1
D   2020Q4   1     1   0   0   0
D   2021Q4   1     1   0   0   0

当前一个值为 NaN 时,我如何确保得到 NaN? 期望的输出

ID   date  winner  WW  LL  WL  LW
A   2017Q4  NaN
A   2018Q4   1    NaN NaN NaN NaN   
A   2019Q4   0     0   0   1   0    
A   2020Q4   0     0   1   0   0  
A   2021Q4   1     0   0   0   1
B   2017Q4  NaN
B   2018Q4   1    NaN NaN NaN NaN  
B   2019Q4   1     1   0   0   0
B   2020Q4   0     0   0   1   0
B   2021Q4   0     0   1   0   0
C   2017Q4  NaN
C   2018Q4   0    NaN NaN NaN NaN    
C   2019Q4   0     0   1   0   0
C   2020Q4   0     0   1   0   0
C   2021Q4   0     0   1   0   0
D   2017Q4  NaN
D   2018Q4   0    NaN NaN NaN NaN               
D   2019Q4   1     0   0   0   1
D   2020Q4   1     1   0   0   0
D   2021Q4   1     1   0   0   0

如何用最简单的方式做到这一点?

这里有一个方法:使用groupby.shift获取上一条记录;然后使用 numpy.select 分配值,您使用 get_dummies 将其转换为虚拟变量:

import numpy as np
df['previous'] = df.groupby('ID')['winner'].shift()
tmp = df[['previous','winner']]
dummy_vars = ['WW','LL','WL', 'LW']
out = (df.join(pd.get_dummies(np.select([tmp.eq(1).all(1), 
                                         tmp.eq(0).all(1),
                                         tmp.eq([1,0]).all(1), 
                                         tmp.eq([0,1]).all(1)], 
                                        dummy_vars, ''))[dummy_vars+['']]
               .mask(df['previous'].isna(), ''))
       .drop(columns=['previous','']))

输出:

   ID    date  winner WW LL WL LW
0   A  2018Q4       1            
1   A  2019Q4       0  0  0  1  0
2   A  2020Q4       0  0  1  0  0
3   A  2021Q4       1  0  0  0  1
4   B  2018Q4       1            
5   B  2019Q4       1  1  0  0  0
6   B  2020Q4       0  0  0  1  0
7   B  2021Q4       0  0  1  0  0
8   C  2018Q4       0            
9   C  2019Q4       0  0  1  0  0
10  C  2020Q4       0  0  1  0  0
11  C  2021Q4       0  0  1  0  0
12  D  2018Q4       0            
13  D  2019Q4       1  0  0  0  1
14  D  2020Q4       1  1  0  0  0
15  D  2021Q4       1  1  0  0  0
  1. map 1 和 0 到“W”和“L”
  2. 获得 2 期连胜
  3. get_dummies 为“连胜”
  4. join 到原始 DataFrame 忽略每个 ID 的第一行
wins = df["winner"].fillna(0).map({1:"W",0:"L"})
streaks = wins.shift() + wins
other = pd.get_dummies(streaks.where(df["ID"].eq(df["ID"].shift())))
output = df.join(other.where(df["ID"].duplicated()&df["winner"].shift().notna()))

>>> output

   ID    date  winner   LL   LW   WL   WW
0   A  2017Q4     NaN  NaN  NaN  NaN  NaN
1   A  2018Q4     1.0  NaN  NaN  NaN  NaN
2   A  2019Q4     0.0  0.0  0.0  1.0  0.0
3   A  2020Q4     0.0  1.0  0.0  0.0  0.0
4   A  2021Q4     1.0  0.0  1.0  0.0  0.0
5   B  2017Q4     NaN  NaN  NaN  NaN  NaN
6   B  2018Q4     1.0  NaN  NaN  NaN  NaN
7   B  2019Q4     1.0  0.0  0.0  0.0  1.0
8   B  2020Q4     0.0  0.0  0.0  1.0  0.0
9   B  2021Q4     0.0  1.0  0.0  0.0  0.0
10  C  2017Q4     NaN  NaN  NaN  NaN  NaN
11  C  2018Q4     0.0  NaN  NaN  NaN  NaN
12  C  2019Q4     0.0  1.0  0.0  0.0  0.0
13  C  2020Q4     0.0  1.0  0.0  0.0  0.0
14  C  2021Q4     0.0  1.0  0.0  0.0  0.0
15  D  2017Q4     NaN  NaN  NaN  NaN  NaN
16  D  2018Q4     0.0  NaN  NaN  NaN  NaN
17  D  2019Q4     1.0  0.0  1.0  0.0  0.0
18  D  2020Q4     1.0  0.0  0.0  0.0  1.0
19  D  2021Q4     1.0  0.0  0.0  0.0  1.0