在 R 中的双循环中传递带有参数作为列的函数
Passing a function with parameters as columns within double loop in R
我在应用函数(某些公式)时遇到问题,该函数采用两个参数(两列)并在数据框中循环访问它们。所以我有 20 个变量,我希望函数重新计算两个变量的所有组合并将结果保存在一个变量中(所以在某种程度上,脚本应该创建 3 x 3 = 9 个新变量)。所以我想使用双循环。
示例:
dflist1 <- c(dat$DIM1, dat$DIM2, dat$DIM3)
k = 1
for (i in dflist1) {
for (j in dflist1) {
但是,我希望应用包含公式的函数(并采用两个参数:一个作为一列,第二个作为另一列)。
函数示例:
calc <- function(i,j)
{
abs(i*j)/(i*j)*log(1+abs(i*j))
}
calc(i = dat$DIM1, j = dat$DIM2)
所以回到 for 循环 - 问题来了,当我尝试应用函数并将其保存在另一列时,它只保存最后计算的结果(我还没有设置新创建变量的迭代) :
for (i in dflist1) {
for (j in dflist1) {
dat$kk <- mapply(FUN = calc, i, j, SIMPLIFY = TRUE)
print(dat$kk)
}
k = k + 1
}
有人可以帮我吗?所以我需要通过所有列组合迭代计算(按行)并将结果写入新列。
您可以使用嵌套 lapply
创建新列,然后使用嵌套 do.call
将它们绑定到您的数据:
set.seed(123)
df <- data.frame(a = rnorm(1000), b = rnorm(1000), c = rnorm(1000))
calc <- function(i,j)
{
abs(i*j)/(i*j)*log(1+abs(i*j))
}
newcols <- lapply(df, function(x) lapply(df, function(y) calc(x, y)))
df_new <- cbind(df, do.call(cbind, do.call(cbind, newcols)))
> head(df_new)
a b c 1 2 3 4 5 6 7
1 -0.56047565 -0.99579872 -0.5116037 0.273177095 0.443480566 0.25211300 0.443480566 0.6889459178 0.411748217 0.25211300
2 -0.23017749 -1.03995504 0.2369379 0.051625832 0.214606608 -0.05310253 0.214606608 0.7330919071 -0.220263201 -0.05310253
3 1.55870831 -0.01798024 -0.5415892 1.232435358 -0.027640410 -0.61203449 -0.027640410 0.0003232368 0.009690796 -0.61203449
4 0.07050839 -0.13217513 1.2192276 0.004959116 -0.009276298 0.08246971 -0.009276298 0.0173194151 -0.149412251 0.08246971
5 0.12928774 -2.54934277 0.1741359 0.016577155 -0.284877208 0.02226394 -0.284877208 2.0147894919 -0.367369972 0.02226394
6 1.71506499 1.04057346 -0.6152683 1.371548145 1.024122587 -0.72038540 1.024122587 0.7337098375 -0.494837621 -0.72038540
8 9
1 0.411748217 0.23249043
2 -0.220263201 0.05462033
3 0.009690796 0.25721165
4 -0.149412251 0.91088256
5 -0.367369972 0.02987264
6 -0.494837621 0.32103592
data.table
解决方案使用 CJ()
cross join 函数创建 calc()
函数的 3 x 3 对输入参数, dcast()
将计算结果从长格式重塑为宽格式,并使用 join 将 9 个计算列附加到原始 data.frame:
library(data.table)
setDT(df)[, rn := .I]
df[df[, CJ(c(a, b, c), c(a, b, c)), by = rn][
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
a b c 1 2 3 4
1: -0.56047565 -0.99579872 -0.5116037 0.688945918 0.443480566 0.41174822 0.443480566
2: -0.23017749 -1.03995504 0.2369379 0.733091907 0.214606608 -0.22026320 0.214606608
3: 1.55870831 -0.01798024 -0.5415892 0.257211652 0.009690796 -0.61203449 0.009690796
4: 0.07050839 -0.13217513 1.2192276 0.017319415 -0.009276298 -0.14941225 -0.009276298
5: 0.12928774 -2.54934277 0.1741359 2.014789492 -0.284877208 -0.36736997 -0.284877208
---
996: -0.08997520 0.07664366 1.0609662 0.008062943 -0.006872360 -0.09117496 -0.006872360
997: 1.07051604 0.25516476 -0.4455056 0.181050147 -0.107667460 -0.38995947 -0.107667460
998: -1.35110039 0.27744682 -0.4291802 1.038675520 0.457339699 -0.31835082 0.457339699
999: -0.52261670 0.53685602 1.1890118 0.241477031 -0.247305230 -0.48328838 -0.247305230
1000: -0.24919068 -0.46048557 0.8342941 0.192310630 0.108629008 -0.32510818 0.108629008
5 6 7 8 9
1: 0.2731770948 0.25211300 0.41174822 0.25211300 0.23249043
2: 0.0516258319 -0.05310253 -0.22026320 -0.05310253 0.05462033
3: 0.0003232368 -0.02764041 -0.61203449 -0.02764041 1.23243536
4: 0.0049591165 0.08246971 -0.14941225 0.08246971 0.91088256
5: 0.0165771550 0.02226394 -0.36736997 0.02226394 0.02987264
---
996: 0.0058570650 0.07817913 -0.09117496 0.07817913 0.75407734
997: 0.0630771939 0.24150040 -0.38995947 0.24150040 0.76360778
998: 0.1690637292 -0.11250215 -0.31835082 -0.11250215 0.07415780
999: 0.2532570646 0.49367630 -0.48328838 0.49367630 0.88118117
1000: 0.0602443085 -0.18888191 -0.32510818 -0.18888191 0.52830001
请注意,与 比较时,计算值可能会出现在不同的列中。
编辑:交换函数的改进版本[=104=]
显然,OP 的函数定义是可交换的,即 calc(1, 2)
returns 与 calc(2, 1)
的值相同。这就是为什么我们在每一行中只找到 6 个不同的计算值。
在交换函数的情况下,我们可以节省 3 个重复值的计算。所以不是做一个完整的交叉连接
CJ(1:3, 1:3)
V1 V2
1: 1 1
2: 1 2
3: 1 3
4: 2 1
5: 2 2
6: 2 3
7: 3 1
8: 3 2
9: 3 3
我们只能使用独特的组合
CJ(1:3, 1:3)[V1 <= V2]
V1 V2
1: 1 1
2: 1 2
3: 1 3
4: 2 2
5: 2 3
6: 3 3
请注意,这只是一个说明效果的示例,不能单独使用。
我们需要修改完整的data.table
表达式:
df[df[, CJ(c(a, b, c), c(a, b, c), sorted = FALSE)[V1 <= V2], by = rn][
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
a b c 1 2 3
1: -0.56047565 -0.99579872 -0.5116037 0.688945918 0.443480566 0.41174822
2: -0.23017749 -1.03995504 0.2369379 0.733091907 0.214606608 -0.22026320
3: 1.55870831 -0.01798024 -0.5415892 0.257211652 0.009690796 -0.61203449
4: 0.07050839 -0.13217513 1.2192276 0.017319415 -0.009276298 -0.14941225
5: 0.12928774 -2.54934277 0.1741359 2.014789492 -0.284877208 -0.36736997
---
996: -0.08997520 0.07664366 1.0609662 0.008062943 -0.006872360 -0.09117496
997: 1.07051604 0.25516476 -0.4455056 0.181050147 -0.107667460 -0.38995947
998: -1.35110039 0.27744682 -0.4291802 1.038675520 0.457339699 -0.31835082
999: -0.52261670 0.53685602 1.1890118 0.241477031 -0.247305230 -0.48328838
1000: -0.24919068 -0.46048557 0.8342941 0.192310630 0.108629008 -0.32510818
4 5 6
1: 0.2731770948 0.25211300 0.23249043
2: 0.0516258319 -0.05310253 0.05462033
3: 0.0003232368 -0.02764041 1.23243536
4: 0.0049591165 0.08246971 0.91088256
5: 0.0165771550 0.02226394 0.02987264
---
996: 0.0058570650 0.07817913 0.75407734
997: 0.0630771939 0.24150040 0.76360778
998: 0.1690637292 -0.11250215 0.07415780
999: 0.2532570646 0.49367630 0.88118117
1000: 0.0602443085 -0.18888191 0.52830001
请注意,需要 sorted = FALSE
来保持提供给 CJ()
的值的顺序。 CJ()
将默认对值进行排序。
编辑 2:保存以键入列名称
如果有更多的列名,为cross join.
输入所有列名两次可能会很麻烦
可以通过以下修改解决:
df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE), by = rn][u
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
或
df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE)[V1 <= V2], by = rn][
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
对于交换情况。
do.call()
根据名称或函数构造并执行函数调用以及要传递给它的参数列表。(请参阅 help(do.call)
). .SD
是一个特殊符号,表示每个组的数据子集,不包括 by
中使用的任何列。当我们按每一行分组时,这里 .SD
是一个列表,每列都有一个值传递给 c()
函数。
通过引用.SD
,df
的所有列都用于cross join,除了by
参数中的列。但是,我们可以使用 .SDcols
参数指定要包含在 cross join 中的列,例如
df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE)[V1 <= V2], by = rn,
.SDcols = 1:2][
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
将只使用 df
的前两列
a b c 1 2 3
1: -0.56047565 -0.99579872 -0.5116037 0.6889459178 0.443480566 0.273177095
2: -0.23017749 -1.03995504 0.2369379 0.7330919071 0.214606608 0.051625832
3: 1.55870831 -0.01798024 -0.5415892 0.0003232368 -0.027640410 1.232435358
4: 0.07050839 -0.13217513 1.2192276 0.0173194151 -0.009276298 0.004959116
5: 0.12928774 -2.54934277 0.1741359 2.0147894919 -0.284877208 0.016577155
---
996: -0.08997520 0.07664366 1.0609662 0.0080629430 -0.006872360 0.005857065
997: 1.07051604 0.25516476 -0.4455056 0.0630771939 0.241500405 0.763607781
998: -1.35110039 0.27744682 -0.4291802 1.0386755196 -0.318350818 0.074157797
999: -0.52261670 0.53685602 1.1890118 0.2414770311 -0.247305230 0.253257065
1000: -0.24919068 -0.46048557 0.8342941 0.1923106299 0.108629008 0.060244308
编辑 3:删除重复项时保留组合顺序
参数 sorted = FALSE
是必需的,但不足以根据列编号使创建的组合始终保持相同的顺序。这是由于 [V1 <= V2]
比较 值 但 不是列的位置 。
因此我们必须确保始终从 table 组合中删除相同的行。这是一个小例子:
test <- data.table(rn = 1:3,
a = LETTERS[c(3L, 1:2)],
b = LETTERS[c(2:3, 1L)],
c = LETTERS[1:3])
test
rn a b c
1: 1 C B A
2: 2 A C B
3: 3 B A C
# dropping duplicates by value
test[, CJ(c(a, b, c), c(a, b, c), sorted = FALSE)[,cn := .I][V1 <= V2], by = rn]
rn V1 V2 cn
1: 1 C C 1
2: 1 B C 4
3: 1 B B 5
4: 1 A C 7
5: 1 A B 8
6: 1 A A 9
7: 2 A A 1
8: 2 A C 2
9: 2 A B 3
10: 2 C C 5
11: 2 B C 8
12: 2 B B 9
13: 3 B B 1
14: 3 B C 3
15: 3 A B 4
16: 3 A A 5
17: 3 A C 6
18: 3 C C 9
# dropping duplicates by position
drop <- CJ(1:3, 1:3)[V1 > V2, which = TRUE]
test[, CJ(c(a, b, c), c(a, b, c), sorted = FALSE)[,cn := .I][-drop], by = rn]
rn V1 V2 cn
1: 1 C C 1
2: 1 C B 2
3: 1 C A 3
4: 1 B B 5
5: 1 B A 6
6: 1 A A 9
7: 2 A A 1
8: 2 A C 2
9: 2 A B 3
10: 2 C C 5
11: 2 C B 6
12: 2 B B 9
13: 3 B B 1
14: 3 B A 2
15: 3 B C 3
16: 3 A A 5
17: 3 A C 6
18: 3 C C 9
为了说明,创建的组合在被过滤之前已经被连续编号。按位置过滤为每个输入行 rn
.
保持相同的组合 cn
如果 df
的所有列都用于创建组合,同时保持位置,则代码最终变为
drop <- CJ(seq_along(df), seq_along(df))[V1 > V2, which = TRUE]
setDT(df)[, rn := .I] # execution order is important, drop needs to be computed first
df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE)[-drop], by = rn][
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
a b c 1 2 3 4
1: -0.56047565 -0.99579872 -0.5116037 0.273177095 0.443480566 0.25211300 0.6889459178
2: -0.23017749 -1.03995504 0.2369379 0.051625832 0.214606608 -0.05310253 0.7330919071
3: 1.55870831 -0.01798024 -0.5415892 1.232435358 -0.027640410 -0.61203449 0.0003232368
4: 0.07050839 -0.13217513 1.2192276 0.004959116 -0.009276298 0.08246971 0.0173194151
5: 0.12928774 -2.54934277 0.1741359 0.016577155 -0.284877208 0.02226394 2.0147894919
---
996: -0.08997520 0.07664366 1.0609662 0.008062943 -0.006872360 -0.09117496 0.0058570650
997: 1.07051604 0.25516476 -0.4455056 0.763607781 0.241500405 -0.38995947 0.0630771939
998: -1.35110039 0.27744682 -0.4291802 1.038675520 -0.318350818 0.45733970 0.0741577975
999: -0.52261670 0.53685602 1.1890118 0.241477031 -0.247305230 -0.48328838 0.2532570646
1000: -0.24919068 -0.46048557 0.8342941 0.060244308 0.108629008 -0.18888191 0.1923106299
5 6
1: 0.411748217 0.23249043
2: -0.220263201 0.05462033
3: 0.009690796 0.25721165
4: -0.149412251 0.91088256
5: -0.367369972 0.02987264
---
996: 0.078179132 0.75407734
997: -0.107667460 0.18105015
998: -0.112502154 0.16906373
999: 0.493676298 0.88118117
1000: -0.325108180 0.52830001
我在应用函数(某些公式)时遇到问题,该函数采用两个参数(两列)并在数据框中循环访问它们。所以我有 20 个变量,我希望函数重新计算两个变量的所有组合并将结果保存在一个变量中(所以在某种程度上,脚本应该创建 3 x 3 = 9 个新变量)。所以我想使用双循环。
示例:
dflist1 <- c(dat$DIM1, dat$DIM2, dat$DIM3)
k = 1
for (i in dflist1) {
for (j in dflist1) {
但是,我希望应用包含公式的函数(并采用两个参数:一个作为一列,第二个作为另一列)。
函数示例:
calc <- function(i,j)
{
abs(i*j)/(i*j)*log(1+abs(i*j))
}
calc(i = dat$DIM1, j = dat$DIM2)
所以回到 for 循环 - 问题来了,当我尝试应用函数并将其保存在另一列时,它只保存最后计算的结果(我还没有设置新创建变量的迭代) :
for (i in dflist1) {
for (j in dflist1) {
dat$kk <- mapply(FUN = calc, i, j, SIMPLIFY = TRUE)
print(dat$kk)
}
k = k + 1
}
有人可以帮我吗?所以我需要通过所有列组合迭代计算(按行)并将结果写入新列。
您可以使用嵌套 lapply
创建新列,然后使用嵌套 do.call
将它们绑定到您的数据:
set.seed(123)
df <- data.frame(a = rnorm(1000), b = rnorm(1000), c = rnorm(1000))
calc <- function(i,j)
{
abs(i*j)/(i*j)*log(1+abs(i*j))
}
newcols <- lapply(df, function(x) lapply(df, function(y) calc(x, y)))
df_new <- cbind(df, do.call(cbind, do.call(cbind, newcols)))
> head(df_new)
a b c 1 2 3 4 5 6 7
1 -0.56047565 -0.99579872 -0.5116037 0.273177095 0.443480566 0.25211300 0.443480566 0.6889459178 0.411748217 0.25211300
2 -0.23017749 -1.03995504 0.2369379 0.051625832 0.214606608 -0.05310253 0.214606608 0.7330919071 -0.220263201 -0.05310253
3 1.55870831 -0.01798024 -0.5415892 1.232435358 -0.027640410 -0.61203449 -0.027640410 0.0003232368 0.009690796 -0.61203449
4 0.07050839 -0.13217513 1.2192276 0.004959116 -0.009276298 0.08246971 -0.009276298 0.0173194151 -0.149412251 0.08246971
5 0.12928774 -2.54934277 0.1741359 0.016577155 -0.284877208 0.02226394 -0.284877208 2.0147894919 -0.367369972 0.02226394
6 1.71506499 1.04057346 -0.6152683 1.371548145 1.024122587 -0.72038540 1.024122587 0.7337098375 -0.494837621 -0.72038540
8 9
1 0.411748217 0.23249043
2 -0.220263201 0.05462033
3 0.009690796 0.25721165
4 -0.149412251 0.91088256
5 -0.367369972 0.02987264
6 -0.494837621 0.32103592
data.table
解决方案使用 CJ()
cross join 函数创建 calc()
函数的 3 x 3 对输入参数, dcast()
将计算结果从长格式重塑为宽格式,并使用 join 将 9 个计算列附加到原始 data.frame:
library(data.table)
setDT(df)[, rn := .I]
df[df[, CJ(c(a, b, c), c(a, b, c)), by = rn][
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
a b c 1 2 3 4 1: -0.56047565 -0.99579872 -0.5116037 0.688945918 0.443480566 0.41174822 0.443480566 2: -0.23017749 -1.03995504 0.2369379 0.733091907 0.214606608 -0.22026320 0.214606608 3: 1.55870831 -0.01798024 -0.5415892 0.257211652 0.009690796 -0.61203449 0.009690796 4: 0.07050839 -0.13217513 1.2192276 0.017319415 -0.009276298 -0.14941225 -0.009276298 5: 0.12928774 -2.54934277 0.1741359 2.014789492 -0.284877208 -0.36736997 -0.284877208 --- 996: -0.08997520 0.07664366 1.0609662 0.008062943 -0.006872360 -0.09117496 -0.006872360 997: 1.07051604 0.25516476 -0.4455056 0.181050147 -0.107667460 -0.38995947 -0.107667460 998: -1.35110039 0.27744682 -0.4291802 1.038675520 0.457339699 -0.31835082 0.457339699 999: -0.52261670 0.53685602 1.1890118 0.241477031 -0.247305230 -0.48328838 -0.247305230 1000: -0.24919068 -0.46048557 0.8342941 0.192310630 0.108629008 -0.32510818 0.108629008 5 6 7 8 9 1: 0.2731770948 0.25211300 0.41174822 0.25211300 0.23249043 2: 0.0516258319 -0.05310253 -0.22026320 -0.05310253 0.05462033 3: 0.0003232368 -0.02764041 -0.61203449 -0.02764041 1.23243536 4: 0.0049591165 0.08246971 -0.14941225 0.08246971 0.91088256 5: 0.0165771550 0.02226394 -0.36736997 0.02226394 0.02987264 --- 996: 0.0058570650 0.07817913 -0.09117496 0.07817913 0.75407734 997: 0.0630771939 0.24150040 -0.38995947 0.24150040 0.76360778 998: 0.1690637292 -0.11250215 -0.31835082 -0.11250215 0.07415780 999: 0.2532570646 0.49367630 -0.48328838 0.49367630 0.88118117 1000: 0.0602443085 -0.18888191 -0.32510818 -0.18888191 0.52830001
请注意,与
编辑:交换函数的改进版本[=104=]
显然,OP 的函数定义是可交换的,即 calc(1, 2)
returns 与 calc(2, 1)
的值相同。这就是为什么我们在每一行中只找到 6 个不同的计算值。
在交换函数的情况下,我们可以节省 3 个重复值的计算。所以不是做一个完整的交叉连接
CJ(1:3, 1:3)
V1 V2 1: 1 1 2: 1 2 3: 1 3 4: 2 1 5: 2 2 6: 2 3 7: 3 1 8: 3 2 9: 3 3
我们只能使用独特的组合
CJ(1:3, 1:3)[V1 <= V2]
V1 V2 1: 1 1 2: 1 2 3: 1 3 4: 2 2 5: 2 3 6: 3 3
请注意,这只是一个说明效果的示例,不能单独使用。
我们需要修改完整的data.table
表达式:
df[df[, CJ(c(a, b, c), c(a, b, c), sorted = FALSE)[V1 <= V2], by = rn][
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
a b c 1 2 3 1: -0.56047565 -0.99579872 -0.5116037 0.688945918 0.443480566 0.41174822 2: -0.23017749 -1.03995504 0.2369379 0.733091907 0.214606608 -0.22026320 3: 1.55870831 -0.01798024 -0.5415892 0.257211652 0.009690796 -0.61203449 4: 0.07050839 -0.13217513 1.2192276 0.017319415 -0.009276298 -0.14941225 5: 0.12928774 -2.54934277 0.1741359 2.014789492 -0.284877208 -0.36736997 --- 996: -0.08997520 0.07664366 1.0609662 0.008062943 -0.006872360 -0.09117496 997: 1.07051604 0.25516476 -0.4455056 0.181050147 -0.107667460 -0.38995947 998: -1.35110039 0.27744682 -0.4291802 1.038675520 0.457339699 -0.31835082 999: -0.52261670 0.53685602 1.1890118 0.241477031 -0.247305230 -0.48328838 1000: -0.24919068 -0.46048557 0.8342941 0.192310630 0.108629008 -0.32510818 4 5 6 1: 0.2731770948 0.25211300 0.23249043 2: 0.0516258319 -0.05310253 0.05462033 3: 0.0003232368 -0.02764041 1.23243536 4: 0.0049591165 0.08246971 0.91088256 5: 0.0165771550 0.02226394 0.02987264 --- 996: 0.0058570650 0.07817913 0.75407734 997: 0.0630771939 0.24150040 0.76360778 998: 0.1690637292 -0.11250215 0.07415780 999: 0.2532570646 0.49367630 0.88118117 1000: 0.0602443085 -0.18888191 0.52830001
请注意,需要 sorted = FALSE
来保持提供给 CJ()
的值的顺序。 CJ()
将默认对值进行排序。
编辑 2:保存以键入列名称
如果有更多的列名,为cross join.
输入所有列名两次可能会很麻烦可以通过以下修改解决:
df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE), by = rn][u
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
或
df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE)[V1 <= V2], by = rn][
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
对于交换情况。
do.call()
根据名称或函数构造并执行函数调用以及要传递给它的参数列表。(请参阅 help(do.call)
). .SD
是一个特殊符号,表示每个组的数据子集,不包括 by
中使用的任何列。当我们按每一行分组时,这里 .SD
是一个列表,每列都有一个值传递给 c()
函数。
通过引用.SD
,df
的所有列都用于cross join,除了by
参数中的列。但是,我们可以使用 .SDcols
参数指定要包含在 cross join 中的列,例如
df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE)[V1 <= V2], by = rn,
.SDcols = 1:2][
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
将只使用 df
a b c 1 2 3 1: -0.56047565 -0.99579872 -0.5116037 0.6889459178 0.443480566 0.273177095 2: -0.23017749 -1.03995504 0.2369379 0.7330919071 0.214606608 0.051625832 3: 1.55870831 -0.01798024 -0.5415892 0.0003232368 -0.027640410 1.232435358 4: 0.07050839 -0.13217513 1.2192276 0.0173194151 -0.009276298 0.004959116 5: 0.12928774 -2.54934277 0.1741359 2.0147894919 -0.284877208 0.016577155 --- 996: -0.08997520 0.07664366 1.0609662 0.0080629430 -0.006872360 0.005857065 997: 1.07051604 0.25516476 -0.4455056 0.0630771939 0.241500405 0.763607781 998: -1.35110039 0.27744682 -0.4291802 1.0386755196 -0.318350818 0.074157797 999: -0.52261670 0.53685602 1.1890118 0.2414770311 -0.247305230 0.253257065 1000: -0.24919068 -0.46048557 0.8342941 0.1923106299 0.108629008 0.060244308
编辑 3:删除重复项时保留组合顺序
参数 sorted = FALSE
是必需的,但不足以根据列编号使创建的组合始终保持相同的顺序。这是由于 [V1 <= V2]
比较 值 但 不是列的位置 。
因此我们必须确保始终从 table 组合中删除相同的行。这是一个小例子:
test <- data.table(rn = 1:3,
a = LETTERS[c(3L, 1:2)],
b = LETTERS[c(2:3, 1L)],
c = LETTERS[1:3])
test
rn a b c 1: 1 C B A 2: 2 A C B 3: 3 B A C
# dropping duplicates by value
test[, CJ(c(a, b, c), c(a, b, c), sorted = FALSE)[,cn := .I][V1 <= V2], by = rn]
rn V1 V2 cn 1: 1 C C 1 2: 1 B C 4 3: 1 B B 5 4: 1 A C 7 5: 1 A B 8 6: 1 A A 9 7: 2 A A 1 8: 2 A C 2 9: 2 A B 3 10: 2 C C 5 11: 2 B C 8 12: 2 B B 9 13: 3 B B 1 14: 3 B C 3 15: 3 A B 4 16: 3 A A 5 17: 3 A C 6 18: 3 C C 9
# dropping duplicates by position
drop <- CJ(1:3, 1:3)[V1 > V2, which = TRUE]
test[, CJ(c(a, b, c), c(a, b, c), sorted = FALSE)[,cn := .I][-drop], by = rn]
rn V1 V2 cn 1: 1 C C 1 2: 1 C B 2 3: 1 C A 3 4: 1 B B 5 5: 1 B A 6 6: 1 A A 9 7: 2 A A 1 8: 2 A C 2 9: 2 A B 3 10: 2 C C 5 11: 2 C B 6 12: 2 B B 9 13: 3 B B 1 14: 3 B A 2 15: 3 B C 3 16: 3 A A 5 17: 3 A C 6 18: 3 C C 9
为了说明,创建的组合在被过滤之前已经被连续编号。按位置过滤为每个输入行 rn
.
cn
如果 df
的所有列都用于创建组合,同时保持位置,则代码最终变为
drop <- CJ(seq_along(df), seq_along(df))[V1 > V2, which = TRUE]
setDT(df)[, rn := .I] # execution order is important, drop needs to be computed first
df[df[, CJ(do.call("c", .SD), do.call("c", .SD), sorted = FALSE)[-drop], by = rn][
, value := calc(V1, V2)][
, dcast(.SD, rn ~ rowid(rn))], on = .(rn)][, !"rn"]
a b c 1 2 3 4 1: -0.56047565 -0.99579872 -0.5116037 0.273177095 0.443480566 0.25211300 0.6889459178 2: -0.23017749 -1.03995504 0.2369379 0.051625832 0.214606608 -0.05310253 0.7330919071 3: 1.55870831 -0.01798024 -0.5415892 1.232435358 -0.027640410 -0.61203449 0.0003232368 4: 0.07050839 -0.13217513 1.2192276 0.004959116 -0.009276298 0.08246971 0.0173194151 5: 0.12928774 -2.54934277 0.1741359 0.016577155 -0.284877208 0.02226394 2.0147894919 --- 996: -0.08997520 0.07664366 1.0609662 0.008062943 -0.006872360 -0.09117496 0.0058570650 997: 1.07051604 0.25516476 -0.4455056 0.763607781 0.241500405 -0.38995947 0.0630771939 998: -1.35110039 0.27744682 -0.4291802 1.038675520 -0.318350818 0.45733970 0.0741577975 999: -0.52261670 0.53685602 1.1890118 0.241477031 -0.247305230 -0.48328838 0.2532570646 1000: -0.24919068 -0.46048557 0.8342941 0.060244308 0.108629008 -0.18888191 0.1923106299 5 6 1: 0.411748217 0.23249043 2: -0.220263201 0.05462033 3: 0.009690796 0.25721165 4: -0.149412251 0.91088256 5: -0.367369972 0.02987264 --- 996: 0.078179132 0.75407734 997: -0.107667460 0.18105015 998: -0.112502154 0.16906373 999: 0.493676298 0.88118117 1000: -0.325108180 0.52830001