通过列引用循环赋值
Looping assignment by reference through columns
我发现了很多类似的问题,但没有一个适合我的问题。
我有两个大型数据框,几乎没有公共列。我正在尝试通过引用将第一个 df 的值分配给第二个。
我尝试了更多的组合,但没有一个是正确的,例如:
library(data.table)
#create dfs
set.seed(32)
DB <- data.frame(A=sample(c("A","B","C","D","E"),30,replace = T))
DB2 <- data.frame(A=sample(c("A","B","C","D","E","F","G"),60,replace = T),
B=rep(rnorm(60,mean=5)),
C=rep(rnorm(60,mean=10)))
#loop
for (i in c("B","C")){
setDT(DB)[DB2, i := i, on = .(A == A)]}
因此,我想循环下面的代码:
setDT(DB)[DB2, B := B, on = .(A == A)]
setDT(DB)[DB2, C := C, on = .(A == A)]
> DB
A B C
1: C 5.593191 10.697466
2: C 5.593191 10.697466
3: E 4.482933 8.726371
4: D 5.454512 11.054162
5: A 4.306571 11.427917
6: E 4.482933 8.726371
7: D 5.454512 11.054162
8: E 4.482933 8.726371
9: D 5.454512 11.054162
10: B 4.741633 10.846106
11: D 5.454512 11.054162
12: B 4.741633 10.846106
13: D 5.454512 11.054162
14: D 5.454512 11.054162
15: B 4.741633 10.846106
16: D 5.454512 11.054162
17: D 5.454512 11.054162
18: C 5.593191 10.697466
19: D 5.454512 11.054162
20: E 4.482933 8.726371
21: D 5.454512 11.054162
22: E 4.482933 8.726371
23: C 5.593191 10.697466
24: A 4.306571 11.427917
25: C 5.593191 10.697466
26: E 4.482933 8.726371
27: C 5.593191 10.697466
28: C 5.593191 10.697466
29: C 5.593191 10.697466
30: D 5.454512 11.054162
A B C
任何帮助将不胜感激
试用:
library(data.table)
#create dfs
set.seed(32)
DB <- data.frame(A=sample(c("A","B","C","D","E"),30,replace = T))
DB2 <- data.frame(A=sample(c("A","B","C","D","E","F","G"),60,replace = T),
B=rep(rnorm(60,mean=5)),
C=rep(rnorm(60,mean=10)))
#try
setDT(DB)[DB2, c("B", "C") := list(B, C), on = .(A == A)]
DB #output
A B C
1: C 5.593191 10.697466
2: C 5.593191 10.697466
3: E 4.482933 8.726371
4: D 5.454512 11.054162
5: A 4.306571 11.427917
6: E 4.482933 8.726371
7: D 5.454512 11.054162
8: E 4.482933 8.726371
9: D 5.454512 11.054162
10: B 4.741633 10.846106
11: D 5.454512 11.054162
12: B 4.741633 10.846106
13: D 5.454512 11.054162
14: D 5.454512 11.054162
15: B 4.741633 10.846106
16: D 5.454512 11.054162
17: D 5.454512 11.054162
18: C 5.593191 10.697466
19: D 5.454512 11.054162
20: E 4.482933 8.726371
21: D 5.454512 11.054162
22: E 4.482933 8.726371
23: C 5.593191 10.697466
24: A 4.306571 11.427917
25: C 5.593191 10.697466
26: E 4.482933 8.726371
27: C 5.593191 10.697466
28: C 5.593191 10.697466
29: C 5.593191 10.697466
30: D 5.454512 11.054162
A B C
更新
Franck 的建议也应该适用,并且对于大量或大量列更有效(请注意 mget
returns 命名列表)
cols <- colnames(DB2)[!(colnames(DB2) %in% colnames(DB))]
setDT(DB)[DB2, (cols) := mget(paste0("i.", cols)), on = .(A = A)]
我发现了很多类似的问题,但没有一个适合我的问题。 我有两个大型数据框,几乎没有公共列。我正在尝试通过引用将第一个 df 的值分配给第二个。
我尝试了更多的组合,但没有一个是正确的,例如:
library(data.table)
#create dfs
set.seed(32)
DB <- data.frame(A=sample(c("A","B","C","D","E"),30,replace = T))
DB2 <- data.frame(A=sample(c("A","B","C","D","E","F","G"),60,replace = T),
B=rep(rnorm(60,mean=5)),
C=rep(rnorm(60,mean=10)))
#loop
for (i in c("B","C")){
setDT(DB)[DB2, i := i, on = .(A == A)]}
因此,我想循环下面的代码:
setDT(DB)[DB2, B := B, on = .(A == A)]
setDT(DB)[DB2, C := C, on = .(A == A)]
> DB
A B C
1: C 5.593191 10.697466
2: C 5.593191 10.697466
3: E 4.482933 8.726371
4: D 5.454512 11.054162
5: A 4.306571 11.427917
6: E 4.482933 8.726371
7: D 5.454512 11.054162
8: E 4.482933 8.726371
9: D 5.454512 11.054162
10: B 4.741633 10.846106
11: D 5.454512 11.054162
12: B 4.741633 10.846106
13: D 5.454512 11.054162
14: D 5.454512 11.054162
15: B 4.741633 10.846106
16: D 5.454512 11.054162
17: D 5.454512 11.054162
18: C 5.593191 10.697466
19: D 5.454512 11.054162
20: E 4.482933 8.726371
21: D 5.454512 11.054162
22: E 4.482933 8.726371
23: C 5.593191 10.697466
24: A 4.306571 11.427917
25: C 5.593191 10.697466
26: E 4.482933 8.726371
27: C 5.593191 10.697466
28: C 5.593191 10.697466
29: C 5.593191 10.697466
30: D 5.454512 11.054162
A B C
任何帮助将不胜感激
试用:
library(data.table)
#create dfs
set.seed(32)
DB <- data.frame(A=sample(c("A","B","C","D","E"),30,replace = T))
DB2 <- data.frame(A=sample(c("A","B","C","D","E","F","G"),60,replace = T),
B=rep(rnorm(60,mean=5)),
C=rep(rnorm(60,mean=10)))
#try
setDT(DB)[DB2, c("B", "C") := list(B, C), on = .(A == A)]
DB #output
A B C
1: C 5.593191 10.697466
2: C 5.593191 10.697466
3: E 4.482933 8.726371
4: D 5.454512 11.054162
5: A 4.306571 11.427917
6: E 4.482933 8.726371
7: D 5.454512 11.054162
8: E 4.482933 8.726371
9: D 5.454512 11.054162
10: B 4.741633 10.846106
11: D 5.454512 11.054162
12: B 4.741633 10.846106
13: D 5.454512 11.054162
14: D 5.454512 11.054162
15: B 4.741633 10.846106
16: D 5.454512 11.054162
17: D 5.454512 11.054162
18: C 5.593191 10.697466
19: D 5.454512 11.054162
20: E 4.482933 8.726371
21: D 5.454512 11.054162
22: E 4.482933 8.726371
23: C 5.593191 10.697466
24: A 4.306571 11.427917
25: C 5.593191 10.697466
26: E 4.482933 8.726371
27: C 5.593191 10.697466
28: C 5.593191 10.697466
29: C 5.593191 10.697466
30: D 5.454512 11.054162
A B C
更新
Franck 的建议也应该适用,并且对于大量或大量列更有效(请注意 mget
returns 命名列表)
cols <- colnames(DB2)[!(colnames(DB2) %in% colnames(DB))]
setDT(DB)[DB2, (cols) := mget(paste0("i.", cols)), on = .(A = A)]