R中按行名不精确合并两个数据框

Question

我有两个数据框：A 有 189 行和 79 列，B 有 354 行和 2 列。他们的一些行名大致匹配如下：

A:
Sample       value1     value2    value 3  
10003          a          b        d
10003_Qi1      a          a        c
10003_Qi2      b          a        c
10017          b          g        c
10018          b          f        s
10025_Qi       o          w        c
10040_Qi1      x          y        o
10040_ArT1     e          g        g
10125          p          g        m
10140_Ar1      w          n        c
10225          z          c        p

B:
Sample      first
10003       4
10004       8
10018       45
10025       85
10032       7
10040       54
10140       2
10132       8
10200       65
10324       9
10400       32

我想根据行名的不精确匹配合并两个数据框，这将导致：

Sample     value1       value2    value 3    first
10003          a          b        d            4   
10018          b          f        s            45
10025_Qi       o          w        c            85
10040_Qi1      x          y        o            54
10140_Ar1      w          n        c            2

规则是：

如果完全匹配，否则
如果前五个数字匹配，

一个。 _Qi1 and _Qi2 in A, B中的sample取A中的sample value with _Qi1

b。 _Qi1 and _ArT1 in A, B中的sample得到A中的sample value with _Qi1 and merge

Answer 1

这是我的繁琐解决方案：

A<-data.frame(matrix(c(10003,"10003_q1","10007_q1",10008,1,2,3,2,4,3,1,2),4,3))
colnames(A)<-c("sample","value1","value2")

#     sample value1 value2
# 1    10003      1      4
# 2 10003_q1      2      3
# 3 10007_q1      3      1
# 4    10008      2      2

B<-data.frame(matrix(c(10003,10004,10007,10009,4,8,45,85),4,2))
colnames(B)<-c("sample","first")

#   sample first
# 1  10003     4
# 2  10004     8
# 3  10007    45
# 4  10009    85

# step 1: adapt both dataframes
A$first<-NA
A$sample2<-strtrim(A$sample,5)
B$sample<-as.factor(B$sample)

# step 2: work down table A merging values from table B
# note: this assumes that B$sample is unqiue

for(i in 1:NROW(A)){
  ind<-A$sample2[i]==B$sample
  if(sum(ind)!=0){ # makes sure a value was found
  A[i,"first"]<-B$first[ind]
  }
}

# step 3: remove any duplicates of A$sample2
# note: this assumes that the 5 digit number will always come before the number+extension

A<-A[!duplicated(A$sample2),]

#      sample value1 value2 first sample2
# 1     10003      1      4     4   10003
# 3  10007_q1      3      1    45   10007
# 4     10008      2      2    NA   10008

R中按行名不精确合并两个数据框

Inexact merge of two data frames by row name in R

merge

r

match