数据 A 与数据 B 的子集 - 但保留 B 相关列的信息
Subset data A with data B - but keep information of relevant column of B
我有两个数据框,比如 A 和 B,我想使用列信息(匹配)对 A 进行子集化。到目前为止一切顺利——这是我已经知道的事情。 (我知道 match,which 和 %in%)。我需要的是稍微复杂一点,我就是想不通。我想跟踪 B 的 column/columns 的信息。
一个例子:
words <- c("Hello","Experts","Please","Help","Me","Out","With","This","Problem","!")
letterV <- toupper(letters[1:20])
numbersV <- 1:20
combV <- paste("Test_",letters[1:10],sep = "")
rainbowV <- rainbow(20)
testDF.A <- as.data.frame(cbind(letterV,cbind(numbersV,cbind(combV,rainbowV))),stringsAsFactors = F)
testDF.B <- as.data.frame(cbind(numbersV[1:10],cbind(letterV[1:10],cbind(combV,words))),stringsAsFactors = F)
testDF_A:
letterV numbersV combV rainbowV
1 A 1 Test_a #FF0000FF
2 B 2 Test_b #FF4D00FF
3 C 3 Test_c #FF9900FF
4 D 4 Test_d #FFE500FF
5 E 5 Test_e #CCFF00FF
6 F 6 Test_f #80FF00FF
7 G 7 Test_g #33FF00FF
8 H 8 Test_h #00FF19FF
9 I 9 Test_i #00FF66FF
10 J 10 Test_j #00FFB2FF
11 K 11 Test_a #00FFFFFF
12 L 12 Test_b #00B3FFFF
13 M 13 Test_c #0066FFFF
14 N 14 Test_d #001AFFFF
15 O 15 Test_e #3300FFFF
16 P 16 Test_f #7F00FFFF
17 Q 17 Test_g #CC00FFFF
18 R 18 Test_h #FF00E6FF
19 S 19 Test_i #FF0099FF
20 T 20 Test_j #FF004DFF
testDF_B:
V1 V2 combV words
1 1 A Test_a Hello
2 2 B Test_b Experts
3 3 C Test_c Please
4 4 D Test_d Help
5 5 E Test_e Me
6 6 F Test_f Out
7 7 G Test_g With
8 8 H Test_h This
9 9 I Test_i Problem
10 10 J Test_j !
假设我想 compare/match A[3] 和 B[3] 但保留信息 B[4]
预期结果:
keptCol letterV numbersV combV rainbowV
1 Hello A 1 Test_a #FF0000FF
2 Experts B 2 Test_b #FF4D00FF
3 Please C 3 Test_c #FF9900FF
4 Help D 4 Test_d #FFE500FF
5 Me E 5 Test_e #CCFF00FF
6 Out F 6 Test_f #80FF00FF
7 With G 7 Test_g #33FF00FF
8 This H 8 Test_h #00FF19FF
9 Problem I 9 Test_i #00FF66FF
10 ! J 10 Test_j #00FFB2FF
11 Hello K 11 Test_a #00FFFFFF
12 Experts L 12 Test_b #00B3FFFF
13 Please M 13 Test_c #0066FFFF
14 Help N 14 Test_d #001AFFFF
15 Me O 15 Test_e #3300FFFF
16 Out P 16 Test_f #7F00FFFF
17 With Q 17 Test_g #CC00FFFF
18 This R 18 Test_h #FF00E6FF
19 Problem S 19 Test_i #FF0099FF
20 ! T 20 Test_j #FF004DFF
为了让它更难思考 - 我实际上也需要覆盖多次点击:
假设我想要 compare/match A[2] 和 B[1] 但保留 B[2]
的信息
small.a <- as.data.frame(cbind(letters[1:6],cbind(c("Test_a","Test_a","Test_b","Test_a","Test_c","Test_z"),c("a1","b1","c1","d1","e1","f1"))),stringsAsFactors = F)
small.b <- as.data.frame(cbind(c("Test_a","Test_b","Test_d","Test_e","Test_b"),c("Thank","You","Very","Much","!")),stringsAsFactors = F)
small.a
V1 V2 V3
1 a Test_a a1
2 b Test_a b1
3 c Test_b c1
4 d Test_a d1
5 e Test_c e1
6 f Test_z f1
small.b
V1 V2
1 Test_a Thank
2 Test_b You
3 Test_d Very
4 Test_e Much
5 Test_b !
预期结果:
keptCol V1 V2 V3
1 Thank a Test_a a1
2 Thank b Test_a b1
3 You c Test_b c1
4 ! c Test_b c1
5 Thank d Test_a d1
另一个问题可能是如果我想保留多列的信息...
希望我提供的信息足以让您理解问题:)
你可以试试
library(dplyr)
inner_join(small.a, small.b, by=c('V2'='V1'))
或使用merge
merge(small.a, small.b, by.x='V2', by.y='V1')
我有两个数据框,比如 A 和 B,我想使用列信息(匹配)对 A 进行子集化。到目前为止一切顺利——这是我已经知道的事情。 (我知道 match,which 和 %in%)。我需要的是稍微复杂一点,我就是想不通。我想跟踪 B 的 column/columns 的信息。
一个例子:
words <- c("Hello","Experts","Please","Help","Me","Out","With","This","Problem","!")
letterV <- toupper(letters[1:20])
numbersV <- 1:20
combV <- paste("Test_",letters[1:10],sep = "")
rainbowV <- rainbow(20)
testDF.A <- as.data.frame(cbind(letterV,cbind(numbersV,cbind(combV,rainbowV))),stringsAsFactors = F)
testDF.B <- as.data.frame(cbind(numbersV[1:10],cbind(letterV[1:10],cbind(combV,words))),stringsAsFactors = F)
testDF_A:
letterV numbersV combV rainbowV
1 A 1 Test_a #FF0000FF
2 B 2 Test_b #FF4D00FF
3 C 3 Test_c #FF9900FF
4 D 4 Test_d #FFE500FF
5 E 5 Test_e #CCFF00FF
6 F 6 Test_f #80FF00FF
7 G 7 Test_g #33FF00FF
8 H 8 Test_h #00FF19FF
9 I 9 Test_i #00FF66FF
10 J 10 Test_j #00FFB2FF
11 K 11 Test_a #00FFFFFF
12 L 12 Test_b #00B3FFFF
13 M 13 Test_c #0066FFFF
14 N 14 Test_d #001AFFFF
15 O 15 Test_e #3300FFFF
16 P 16 Test_f #7F00FFFF
17 Q 17 Test_g #CC00FFFF
18 R 18 Test_h #FF00E6FF
19 S 19 Test_i #FF0099FF
20 T 20 Test_j #FF004DFF
testDF_B:
V1 V2 combV words
1 1 A Test_a Hello
2 2 B Test_b Experts
3 3 C Test_c Please
4 4 D Test_d Help
5 5 E Test_e Me
6 6 F Test_f Out
7 7 G Test_g With
8 8 H Test_h This
9 9 I Test_i Problem
10 10 J Test_j !
假设我想 compare/match A[3] 和 B[3] 但保留信息 B[4] 预期结果:
keptCol letterV numbersV combV rainbowV
1 Hello A 1 Test_a #FF0000FF
2 Experts B 2 Test_b #FF4D00FF
3 Please C 3 Test_c #FF9900FF
4 Help D 4 Test_d #FFE500FF
5 Me E 5 Test_e #CCFF00FF
6 Out F 6 Test_f #80FF00FF
7 With G 7 Test_g #33FF00FF
8 This H 8 Test_h #00FF19FF
9 Problem I 9 Test_i #00FF66FF
10 ! J 10 Test_j #00FFB2FF
11 Hello K 11 Test_a #00FFFFFF
12 Experts L 12 Test_b #00B3FFFF
13 Please M 13 Test_c #0066FFFF
14 Help N 14 Test_d #001AFFFF
15 Me O 15 Test_e #3300FFFF
16 Out P 16 Test_f #7F00FFFF
17 With Q 17 Test_g #CC00FFFF
18 This R 18 Test_h #FF00E6FF
19 Problem S 19 Test_i #FF0099FF
20 ! T 20 Test_j #FF004DFF
为了让它更难思考 - 我实际上也需要覆盖多次点击:
假设我想要 compare/match A[2] 和 B[1] 但保留 B[2]
的信息small.a <- as.data.frame(cbind(letters[1:6],cbind(c("Test_a","Test_a","Test_b","Test_a","Test_c","Test_z"),c("a1","b1","c1","d1","e1","f1"))),stringsAsFactors = F)
small.b <- as.data.frame(cbind(c("Test_a","Test_b","Test_d","Test_e","Test_b"),c("Thank","You","Very","Much","!")),stringsAsFactors = F)
small.a
V1 V2 V3
1 a Test_a a1
2 b Test_a b1
3 c Test_b c1
4 d Test_a d1
5 e Test_c e1
6 f Test_z f1
small.b
V1 V2
1 Test_a Thank
2 Test_b You
3 Test_d Very
4 Test_e Much
5 Test_b !
预期结果:
keptCol V1 V2 V3
1 Thank a Test_a a1
2 Thank b Test_a b1
3 You c Test_b c1
4 ! c Test_b c1
5 Thank d Test_a d1
另一个问题可能是如果我想保留多列的信息...
希望我提供的信息足以让您理解问题:)
你可以试试
library(dplyr)
inner_join(small.a, small.b, by=c('V2'='V1'))
或使用merge
merge(small.a, small.b, by.x='V2', by.y='V1')