创建许多滞后变量

Create many lagged variables

我有以下 Python datatable:

import datatable
import numpy as np

np.random.seed(42)

dt = datatable.Frame({"A":np.repeat(np.arange(0, 2), 5), "B":np.random.normal(0, 1, 10)})
dt

#          A        B
#0         0        −0.342855
#1         0        0.0706784
#2         0        0.0470259
#3         0        −0.0522357
#4         0        −0.610938
#5         1        −2.62617
#6         1        0.550128
#7         1        0.538717
#8         1        −0.487166
#9         1        0.996788

我想为列 A 中的每个值创建 4 个滞后列 B。这将导致以下 datatable:

#          A        B               B_lag_1         B_lag_2         B_lag_3         B_lag_4
#0         0        −0.342855       NA              NA              NA              NA
#1         0        0.0706784       −0.342855       NA              NA              NA
#2         0        0.0470259       0.0706784       −0.342855       NA              NA
#3         0        −0.0522357      0.0470259       0.0706784       −0.342855       NA
#4         0        −0.610938       −0.0522357      0.0470259       0.0706784       −0.342855
#5         1        −2.62617        NA              NA              NA              NA
#6         1        0.550128        −2.62617        NA              NA              NA
#7         1        0.538717        0.550128        −2.62617        NA              NA
#8         1        −0.487166       0.538717        0.550128        −2.62617        NA
#9         1        0.996788        −0.487166       0.538717        0.550128        −2.62617

我怎样才能做到这一点?

我从未使用过 datatablepandas.DataFramegroupby()shift() 我在 datatable.

中发现了类似的功能

您可以使用:

  • by("A") 按列 A 中的值对行进行分组并分别在每个组中工作
  • shift(datatable.f.B, n)B.
  • 列中将值 n-rows 向下移动
import datatable as dt
import numpy as np

np.random.seed(42)

df = dt.Frame({
    "A": np.repeat(np.arange(0, 2), 5), 
    "B": np.random.normal(0, 1, 10)
})

for n in range(1, 5):
    df[f'B_lag_{n}'] = df[:, dt.shift(dt.f.B, n), dt.by('A')]['B']
    
df    

结果

   |     A          B    B_lag_1    B_lag_2    B_lag_3    B_lag_4
   | int64    float64    float64    float64    float64    float64
-- + -----  ---------  ---------  ---------  ---------  ---------
 0 |     0   0.496714  NA         NA         NA         NA       
 1 |     0  -0.138264   0.496714  NA         NA         NA       
 2 |     0   0.647689  -0.138264   0.496714  NA         NA       
 3 |     0   1.52303    0.647689  -0.138264   0.496714  NA       
 4 |     0  -0.234153   1.52303    0.647689  -0.138264   0.496714
 5 |     1  -0.234137  NA         NA         NA         NA       
 6 |     1   1.57921   -0.234137  NA         NA         NA       
 7 |     1   0.767435   1.57921   -0.234137  NA         NA       
 8 |     1  -0.469474   0.767435   1.57921   -0.234137  NA       
 9 |     1   0.54256   -0.469474   0.767435   1.57921   -0.234137
[10 rows x 6 columns]