如何为 tidyverse 中的所有列创建交互列?

How do I create interaction columns forall columns in tidyverse?

我正在尝试为数据框中的所有 20 个变量创建交互变量,因此总共有 20 个基本变量和 380 个交互变量。对于任何单个变量,我可以使用以下方法创建包含 19 个变量的数据框:

in_sample[3:22] %>%
transmute(across(.cols = -c(frpm_frac_s), .fns = function(x){x*frpm_frac_s}))

但是我无法遍历列。我尝试在列名向量上使用 map,但无法让 map 中的函数读取 as.symbol(character)。 这是我来自 dput 的数据示例:

structure(list(frpm_frac_s = c(0.870400011539459, 0.904699981212616, 
0.98089998960495, 0.838800013065338, 0.919900000095367, 0.837700009346008, 
0.84799998998642, 0.925999999046326, 0.963900029659271, 0.887899994850159
), enrollment_s = c(364, 608, 571, 705, 566, 838, 421, 757, 693, 
535), ell_frac_s = c(0.46000000834465, 0.334000021219254, 0.300999999046326, 
0.209999993443489, 0.706999957561493, 0.552999973297119, 0.412999987602234, 
0.359000027179718, 0.726000010967255, 0.646999955177307), edi_s = c(8, 
38, 39, 37, 11, 35, 15, 39, 9, 4), te_fte_s = c(23, 22, 20, 25, 
24.5, 36, 18, 30.2999992370605, 24.3999996185303, 19)), row.names = c(NA, 
10L), class = "data.frame")

使用时:

 in_sample[3:22] %>%
    transmute(across(.cols = -c(frpm_frac_s), .fns = function(x){x*frpm_frac_s}))

我得到:

structure(list(enrollment_s = c(316.825604200363, 550.057588577271, 
560.093894064426, 591.354009211063, 520.663400053978, 701.992607831955, 
357.007995784283, 700.981999278069, 667.982720553875, 475.026497244835
), ell_frac_s = c(0.400384012571335, 0.302169812922072, 0.295250895935631, 
0.17614799724412, 0.650369261028242, 0.463248082799339, 0.350223985351086, 
0.33243402482605, 0.699791432103968, 0.574471256869984), edi_s = c(6.96320009231567, 
34.3785992860794, 38.255099594593, 31.0356004834175, 10.118900001049, 
29.3195003271103, 12.7199998497963, 36.1139999628067, 8.67510026693344, 
3.55159997940063), te_fte_s = c(20.0192002654076, 19.9033995866776, 
19.617999792099, 20.9700003266335, 22.5375500023365, 30.1572003364563, 
15.2639998197556, 28.0577992646217, 23.5191603559875, 16.870099902153
)), row.names = c(NA, 10L), class = "data.frame")

我想对所有变量执行此操作,然后将它们绑定在一起。 感谢您的帮助。

您可以使用 model.matrix 创建交互项。 (这是大多数建模函数的幕后工作。)

m = model.matrix(~ .^2 - . + 0, data = df)
m
#    frpm_frac_s:enrollment_s frpm_frac_s:ell_frac_s frpm_frac_s:edi_s frpm_frac_s:te_fte_s
# 1                  316.8256              0.4003840            6.9632             20.01920
# 2                  550.0576              0.3021698           34.3786             19.90340
# 3                  560.0939              0.2952509           38.2551             19.61800
# 4                  591.3540              0.1761480           31.0356             20.97000
# 5                  520.6634              0.6503693           10.1189             22.53755
# 6                  701.9926              0.4632481           29.3195             30.15720
# 7                  357.0080              0.3502240           12.7200             15.26400
# 8                  700.9820              0.3324340           36.1140             28.05780
# 9                  667.9827              0.6997914            8.6751             23.51916
# 10                 475.0265              0.5744713            3.5516             16.87010
#    enrollment_s:ell_frac_s enrollment_s:edi_s enrollment_s:te_fte_s ell_frac_s:edi_s
# 1                  167.440               2912                8372.0            3.680
# 2                  203.072              23104               13376.0           12.692
# 3                  171.871              22269               11420.0           11.739
# 4                  148.050              26085               17625.0            7.770
# 5                  400.162               6226               13867.0            7.777
# 6                  463.414              29330               30168.0           19.355
# 7                  173.873               6315                7578.0            6.195
# 8                  271.763              29523               22937.1           14.001
# 9                  503.118               6237               16909.2            6.534
# 10                 346.145               2140               10165.0            2.588
#    ell_frac_s:te_fte_s edi_s:te_fte_s
# 1              10.5800          184.0
# 2               7.3480          836.0
# 3               6.0200          780.0
# 4               5.2500          925.0
# 5              17.3215          269.5
# 6              19.9080         1260.0
# 7               7.4340          270.0
# 8              10.8777         1181.7
# 9              17.7144          219.6
# 10             12.2930           76.0
# attr(,"assign")
#  [1]  1  2  3  4  5  6  7  8  9 10

你的数学有点不对劲,因为乘法中的顺序无关紧要,有 n * (n - 1) / 2 种可能性(与 n choose 2 相同),所以你应该期望 20 列输入有 190 列输出.

我将公式设为 包含交互项,您也可以使用 ~ .^2 + 0 包含一阶项,或者 ~ .^2 也包含拦截。