在 R 中的双循环中传递带有参数作为列的函数

Passing a function with parameters as columns within double loop in R

我在应用函数(某些公式)时遇到问题,该函数采用两个参数(两列)并在数据框中循环访问它们。所以我有 20 个变量,我希望函数重新计算两个变量的所有组合并将结果保存在一个变量中(所以在某种程度上,脚本应该创建 3 x 3 = 9 个新变量)。所以我想使用双循环。

示例:

dflist1 <- c(dat$DIM1, dat$DIM2, dat$DIM3)
k = 1
    for (i in dflist1) {
        for (j in dflist1) {

但是,我希望应用包含公式的函数(并采用两个参数:一个作为一列,第二个作为另一列)。

函数示例:

calc <- function(i,j)
{
    abs(i*j)/(i*j)*log(1+abs(i*j))
}

calc(i = dat$DIM1, j = dat$DIM2)

所以回到 for 循环 - 问题来了,当我尝试应用函数并将其保存在另一列时,它只保存最后计算的结果(我还没有设置新创建变量的迭代) :

for (i in dflist1) {

    for (j in dflist1) {

    dat$kk <- mapply(FUN = calc, i, j, SIMPLIFY = TRUE)
    print(dat$kk)
    }

k = k + 1

}

有人可以帮我吗?所以我需要通过所有列组合迭代计算(按行)并将结果写入新列。

您可以使用嵌套 lapply 创建新列,然后使用嵌套 do.call 将它们绑定到您的数据:

set.seed(123)
df <- data.frame(a = rnorm(1000), b = rnorm(1000), c = rnorm(1000))
calc <- function(i,j)
{
  abs(i*j)/(i*j)*log(1+abs(i*j))
}

newcols <- lapply(df, function(x) lapply(df, function(y) calc(x, y)))

df_new <- cbind(df, do.call(cbind, do.call(cbind, newcols)))

> head(df_new)
            a           b          c           1            2           3            4            5            6           7
1 -0.56047565 -0.99579872 -0.5116037 0.273177095  0.443480566  0.25211300  0.443480566 0.6889459178  0.411748217  0.25211300
2 -0.23017749 -1.03995504  0.2369379 0.051625832  0.214606608 -0.05310253  0.214606608 0.7330919071 -0.220263201 -0.05310253
3  1.55870831 -0.01798024 -0.5415892 1.232435358 -0.027640410 -0.61203449 -0.027640410 0.0003232368  0.009690796 -0.61203449
4  0.07050839 -0.13217513  1.2192276 0.004959116 -0.009276298  0.08246971 -0.009276298 0.0173194151 -0.149412251  0.08246971
5  0.12928774 -2.54934277  0.1741359 0.016577155 -0.284877208  0.02226394 -0.284877208 2.0147894919 -0.367369972  0.02226394
6  1.71506499  1.04057346 -0.6152683 1.371548145  1.024122587 -0.72038540  1.024122587 0.7337098375 -0.494837621 -0.72038540
             8          9
1  0.411748217 0.23249043
2 -0.220263201 0.05462033
3  0.009690796 0.25721165
4 -0.149412251 0.91088256
5 -0.367369972 0.02987264
6 -0.494837621 0.32103592

data.table 解决方案使用 CJ() cross join 函数创建 calc() 函数的 3 x 3 对输入参数, dcast() 将计算结果从长格式重塑为宽格式,并使用 join 将 9 个计算列附加到原始 data.frame:

library(data.table)
setDT(df)[, rn := .I]
df[df[, CJ(c(a, b, c), c(a, b, c)), by = rn][
  , value := calc(V1, V2)][
    , dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
                a           b          c           1            2           3            4
   1: -0.56047565 -0.99579872 -0.5116037 0.688945918  0.443480566  0.41174822  0.443480566
   2: -0.23017749 -1.03995504  0.2369379 0.733091907  0.214606608 -0.22026320  0.214606608
   3:  1.55870831 -0.01798024 -0.5415892 0.257211652  0.009690796 -0.61203449  0.009690796
   4:  0.07050839 -0.13217513  1.2192276 0.017319415 -0.009276298 -0.14941225 -0.009276298
   5:  0.12928774 -2.54934277  0.1741359 2.014789492 -0.284877208 -0.36736997 -0.284877208
  ---                                                                                     
 996: -0.08997520  0.07664366  1.0609662 0.008062943 -0.006872360 -0.09117496 -0.006872360
 997:  1.07051604  0.25516476 -0.4455056 0.181050147 -0.107667460 -0.38995947 -0.107667460
 998: -1.35110039  0.27744682 -0.4291802 1.038675520  0.457339699 -0.31835082  0.457339699
 999: -0.52261670  0.53685602  1.1890118 0.241477031 -0.247305230 -0.48328838 -0.247305230
1000: -0.24919068 -0.46048557  0.8342941 0.192310630  0.108629008 -0.32510818  0.108629008
                 5           6           7           8          9
   1: 0.2731770948  0.25211300  0.41174822  0.25211300 0.23249043
   2: 0.0516258319 -0.05310253 -0.22026320 -0.05310253 0.05462033
   3: 0.0003232368 -0.02764041 -0.61203449 -0.02764041 1.23243536
   4: 0.0049591165  0.08246971 -0.14941225  0.08246971 0.91088256
   5: 0.0165771550  0.02226394 -0.36736997  0.02226394 0.02987264
  ---                                                            
 996: 0.0058570650  0.07817913 -0.09117496  0.07817913 0.75407734
 997: 0.0630771939  0.24150040 -0.38995947  0.24150040 0.76360778
 998: 0.1690637292 -0.11250215 -0.31835082 -0.11250215 0.07415780
 999: 0.2532570646  0.49367630 -0.48328838  0.49367630 0.88118117
1000: 0.0602443085 -0.18888191 -0.32510818 -0.18888191 0.52830001

请注意,与 比较时,计算值可能会出现在不同的列中。

编辑:交换函数的改进版本[​​=104=]

显然,OP 的函数定义是可交换的,即 calc(1, 2) returns 与 calc(2, 1) 的值相同。这就是为什么我们在每一行中只找到 6 个不同的计算值。

在交换函数的情况下,我们可以节省 3 个重复值的计算。所以不是做一个完整的交叉连接

CJ(1:3, 1:3)
   V1 V2
1:  1  1
2:  1  2
3:  1  3
4:  2  1
5:  2  2
6:  2  3
7:  3  1
8:  3  2
9:  3  3

我们只能使用独特的组合

CJ(1:3, 1:3)[V1 <= V2]
   V1 V2
1:  1  1
2:  1  2
3:  1  3
4:  2  2
5:  2  3
6:  3  3

请注意,这只是一个说明效果的示例,不能单独使用。

我们需要修改完整的data.table表达式:

df[df[, CJ(c(a, b, c), c(a, b, c), sorted = FALSE)[V1 <= V2], by = rn][
  , value := calc(V1, V2)][
    , dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
                a           b          c           1            2           3
   1: -0.56047565 -0.99579872 -0.5116037 0.688945918  0.443480566  0.41174822
   2: -0.23017749 -1.03995504  0.2369379 0.733091907  0.214606608 -0.22026320
   3:  1.55870831 -0.01798024 -0.5415892 0.257211652  0.009690796 -0.61203449
   4:  0.07050839 -0.13217513  1.2192276 0.017319415 -0.009276298 -0.14941225
   5:  0.12928774 -2.54934277  0.1741359 2.014789492 -0.284877208 -0.36736997
  ---                                                                        
 996: -0.08997520  0.07664366  1.0609662 0.008062943 -0.006872360 -0.09117496
 997:  1.07051604  0.25516476 -0.4455056 0.181050147 -0.107667460 -0.38995947
 998: -1.35110039  0.27744682 -0.4291802 1.038675520  0.457339699 -0.31835082
 999: -0.52261670  0.53685602  1.1890118 0.241477031 -0.247305230 -0.48328838
1000: -0.24919068 -0.46048557  0.8342941 0.192310630  0.108629008 -0.32510818
                 4           5          6
   1: 0.2731770948  0.25211300 0.23249043
   2: 0.0516258319 -0.05310253 0.05462033
   3: 0.0003232368 -0.02764041 1.23243536
   4: 0.0049591165  0.08246971 0.91088256
   5: 0.0165771550  0.02226394 0.02987264
  ---                                    
 996: 0.0058570650  0.07817913 0.75407734
 997: 0.0630771939  0.24150040 0.76360778
 998: 0.1690637292 -0.11250215 0.07415780
 999: 0.2532570646  0.49367630 0.88118117
1000: 0.0602443085 -0.18888191 0.52830001

请注意,需要 sorted = FALSE 来保持提供给 CJ() 的值的顺序。 CJ() 将默认对值进行排序。

编辑 2:保存以键入列名称

如果有更多的列名,为cross join.

输入所有列名两次可能会很麻烦

可以通过以下修改解决:

df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE), by = rn][u
  , value := calc(V1, V2)][
    , dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]

df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE)[V1 <= V2], by = rn][
  , value := calc(V1, V2)][
    , dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]

对于交换情况。

do.call() 根据名称或函数构造并执行函数调用以及要传递给它的参数列表。(请参阅 help(do.call) ). .SD 是一个特殊符号,表示每个组的数据子集,不包括 by 中使用的任何列。当我们按每一行分组时,这里 .SD 是一个列表,每列都有一个值传递给 c() 函数。

通过引用.SDdf的所有列都用于cross join,除了by参数中的列。但是,我们可以使用 .SDcols 参数指定要包含在 cross join 中的列,例如

df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE)[V1 <= V2], by = rn, 
      .SDcols = 1:2][
        , value := calc(V1, V2)][
          , dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]

将只使用 df

的前两列
                a           b          c            1            2           3
   1: -0.56047565 -0.99579872 -0.5116037 0.6889459178  0.443480566 0.273177095
   2: -0.23017749 -1.03995504  0.2369379 0.7330919071  0.214606608 0.051625832
   3:  1.55870831 -0.01798024 -0.5415892 0.0003232368 -0.027640410 1.232435358
   4:  0.07050839 -0.13217513  1.2192276 0.0173194151 -0.009276298 0.004959116
   5:  0.12928774 -2.54934277  0.1741359 2.0147894919 -0.284877208 0.016577155
  ---                                                                         
 996: -0.08997520  0.07664366  1.0609662 0.0080629430 -0.006872360 0.005857065
 997:  1.07051604  0.25516476 -0.4455056 0.0630771939  0.241500405 0.763607781
 998: -1.35110039  0.27744682 -0.4291802 1.0386755196 -0.318350818 0.074157797
 999: -0.52261670  0.53685602  1.1890118 0.2414770311 -0.247305230 0.253257065
1000: -0.24919068 -0.46048557  0.8342941 0.1923106299  0.108629008 0.060244308

编辑 3:删除重复项时保留组合顺序

参数 sorted = FALSE 是必需的,但不足以根据列编号使创建的组合始终保持相同的顺序。这是由于 [V1 <= V2] 比较 不是列的位置

因此我们必须确保始终从 table 组合中删除相同的行。这是一个小例子:

test <- data.table(rn = 1:3, 
                   a = LETTERS[c(3L, 1:2)],
                   b = LETTERS[c(2:3, 1L)], 
                   c = LETTERS[1:3])
test
   rn a b c
1:  1 C B A
2:  2 A C B
3:  3 B A C
# dropping duplicates by value
test[, CJ(c(a, b, c), c(a, b, c), sorted = FALSE)[,cn := .I][V1 <= V2], by = rn]
    rn V1 V2 cn
 1:  1  C  C  1
 2:  1  B  C  4
 3:  1  B  B  5
 4:  1  A  C  7
 5:  1  A  B  8
 6:  1  A  A  9
 7:  2  A  A  1
 8:  2  A  C  2
 9:  2  A  B  3
10:  2  C  C  5
11:  2  B  C  8
12:  2  B  B  9
13:  3  B  B  1
14:  3  B  C  3
15:  3  A  B  4
16:  3  A  A  5
17:  3  A  C  6
18:  3  C  C  9
# dropping duplicates by position
drop <- CJ(1:3, 1:3)[V1 > V2, which = TRUE]
test[, CJ(c(a, b, c), c(a, b, c), sorted = FALSE)[,cn := .I][-drop], by = rn]
    rn V1 V2 cn
 1:  1  C  C  1
 2:  1  C  B  2
 3:  1  C  A  3
 4:  1  B  B  5
 5:  1  B  A  6
 6:  1  A  A  9
 7:  2  A  A  1
 8:  2  A  C  2
 9:  2  A  B  3
10:  2  C  C  5
11:  2  C  B  6
12:  2  B  B  9
13:  3  B  B  1
14:  3  B  A  2
15:  3  B  C  3
16:  3  A  A  5
17:  3  A  C  6
18:  3  C  C  9

为了说明,创建的组合在被过滤之前已经被连续编号。按位置过滤为每个输入行 rn.

保持相同的组合 cn

如果 df 的所有列都用于创建组合,同时保持位置,则代码最终变为

drop <- CJ(seq_along(df), seq_along(df))[V1 > V2, which = TRUE]
setDT(df)[, rn := .I] # execution order is important, drop needs to be computed first
df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE)[-drop], by = rn][
        , value := calc(V1, V2)][
          , dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
                a           b          c           1            2           3            4
   1: -0.56047565 -0.99579872 -0.5116037 0.273177095  0.443480566  0.25211300 0.6889459178
   2: -0.23017749 -1.03995504  0.2369379 0.051625832  0.214606608 -0.05310253 0.7330919071
   3:  1.55870831 -0.01798024 -0.5415892 1.232435358 -0.027640410 -0.61203449 0.0003232368
   4:  0.07050839 -0.13217513  1.2192276 0.004959116 -0.009276298  0.08246971 0.0173194151
   5:  0.12928774 -2.54934277  0.1741359 0.016577155 -0.284877208  0.02226394 2.0147894919
  ---                                                                                     
 996: -0.08997520  0.07664366  1.0609662 0.008062943 -0.006872360 -0.09117496 0.0058570650
 997:  1.07051604  0.25516476 -0.4455056 0.763607781  0.241500405 -0.38995947 0.0630771939
 998: -1.35110039  0.27744682 -0.4291802 1.038675520 -0.318350818  0.45733970 0.0741577975
 999: -0.52261670  0.53685602  1.1890118 0.241477031 -0.247305230 -0.48328838 0.2532570646
1000: -0.24919068 -0.46048557  0.8342941 0.060244308  0.108629008 -0.18888191 0.1923106299
                 5          6
   1:  0.411748217 0.23249043
   2: -0.220263201 0.05462033
   3:  0.009690796 0.25721165
   4: -0.149412251 0.91088256
   5: -0.367369972 0.02987264
  ---                        
 996:  0.078179132 0.75407734
 997: -0.107667460 0.18105015
 998: -0.112502154 0.16906373
 999:  0.493676298 0.88118117
1000: -0.325108180 0.52830001