R中按行名不精确合并两个数据框
Inexact merge of two data frames by row name in R
我有两个数据框:A
有 189 行和 79 列,B
有 354 行和 2 列。他们的一些行名大致匹配如下:
A:
Sample value1 value2 value 3
10003 a b d
10003_Qi1 a a c
10003_Qi2 b a c
10017 b g c
10018 b f s
10025_Qi o w c
10040_Qi1 x y o
10040_ArT1 e g g
10125 p g m
10140_Ar1 w n c
10225 z c p
B:
Sample first
10003 4
10004 8
10018 45
10025 85
10032 7
10040 54
10140 2
10132 8
10200 65
10324 9
10400 32
我想根据行名的不精确匹配合并两个数据框,这将导致:
Sample value1 value2 value 3 first
10003 a b d 4
10018 b f s 45
10025_Qi o w c 85
10040_Qi1 x y o 54
10140_Ar1 w n c 2
规则是:
如果完全匹配,否则
如果前五个数字匹配,
一个。 _Qi1
and _Qi2
in A
, B
中的sample取A
中的sample value with _Qi1
b。 _Qi1
and _ArT1
in A
, B
中的sample得到A
中的sample value with _Qi1
and merge
这是我的繁琐解决方案:
A<-data.frame(matrix(c(10003,"10003_q1","10007_q1",10008,1,2,3,2,4,3,1,2),4,3))
colnames(A)<-c("sample","value1","value2")
# sample value1 value2
# 1 10003 1 4
# 2 10003_q1 2 3
# 3 10007_q1 3 1
# 4 10008 2 2
B<-data.frame(matrix(c(10003,10004,10007,10009,4,8,45,85),4,2))
colnames(B)<-c("sample","first")
# sample first
# 1 10003 4
# 2 10004 8
# 3 10007 45
# 4 10009 85
# step 1: adapt both dataframes
A$first<-NA
A$sample2<-strtrim(A$sample,5)
B$sample<-as.factor(B$sample)
# step 2: work down table A merging values from table B
# note: this assumes that B$sample is unqiue
for(i in 1:NROW(A)){
ind<-A$sample2[i]==B$sample
if(sum(ind)!=0){ # makes sure a value was found
A[i,"first"]<-B$first[ind]
}
}
# step 3: remove any duplicates of A$sample2
# note: this assumes that the 5 digit number will always come before the number+extension
A<-A[!duplicated(A$sample2),]
# sample value1 value2 first sample2
# 1 10003 1 4 4 10003
# 3 10007_q1 3 1 45 10007
# 4 10008 2 2 NA 10008
我有两个数据框:A
有 189 行和 79 列,B
有 354 行和 2 列。他们的一些行名大致匹配如下:
A:
Sample value1 value2 value 3
10003 a b d
10003_Qi1 a a c
10003_Qi2 b a c
10017 b g c
10018 b f s
10025_Qi o w c
10040_Qi1 x y o
10040_ArT1 e g g
10125 p g m
10140_Ar1 w n c
10225 z c p
B:
Sample first
10003 4
10004 8
10018 45
10025 85
10032 7
10040 54
10140 2
10132 8
10200 65
10324 9
10400 32
我想根据行名的不精确匹配合并两个数据框,这将导致:
Sample value1 value2 value 3 first
10003 a b d 4
10018 b f s 45
10025_Qi o w c 85
10040_Qi1 x y o 54
10140_Ar1 w n c 2
规则是:
如果完全匹配,否则
如果前五个数字匹配,
一个。
_Qi1
and_Qi2
inA
,B
中的sample取A
中的sample value with_Qi1
b。
_Qi1
and_ArT1
inA
,B
中的sample得到A
中的sample value with_Qi1
and merge
这是我的繁琐解决方案:
A<-data.frame(matrix(c(10003,"10003_q1","10007_q1",10008,1,2,3,2,4,3,1,2),4,3))
colnames(A)<-c("sample","value1","value2")
# sample value1 value2
# 1 10003 1 4
# 2 10003_q1 2 3
# 3 10007_q1 3 1
# 4 10008 2 2
B<-data.frame(matrix(c(10003,10004,10007,10009,4,8,45,85),4,2))
colnames(B)<-c("sample","first")
# sample first
# 1 10003 4
# 2 10004 8
# 3 10007 45
# 4 10009 85
# step 1: adapt both dataframes
A$first<-NA
A$sample2<-strtrim(A$sample,5)
B$sample<-as.factor(B$sample)
# step 2: work down table A merging values from table B
# note: this assumes that B$sample is unqiue
for(i in 1:NROW(A)){
ind<-A$sample2[i]==B$sample
if(sum(ind)!=0){ # makes sure a value was found
A[i,"first"]<-B$first[ind]
}
}
# step 3: remove any duplicates of A$sample2
# note: this assumes that the 5 digit number will always come before the number+extension
A<-A[!duplicated(A$sample2),]
# sample value1 value2 first sample2
# 1 10003 1 4 4 10003
# 3 10007_q1 3 1 45 10007
# 4 10008 2 2 NA 10008