多个数据帧的列之间的部分匹配字符串

Partical match string between columns for multiple dataframes

我有一个数据框列表(df1、df2、df3),我想将其列与另一个数据框 (df) 匹配,并且仅当匹配时才替换字符串。匹配应该基于运行函数时指定的字符串,指定为部分匹配,换句话说这里它只针对包含字符串“TEXT”的字段并且应该适用于TEXT123和TEXTabc等情况。我自己也没走多远...

df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)


list<-c(df1, df2, df3)

df1 示例

partial_match <- function(column_A$df1, column_B, TEXT, df) {
  df1_new <-df1
  df1_new[, column_B] <- ifelse(grepl("TEXT.*", df1[, column_A]),
                           df[, column_B] - nchar(TEXT),
                           df[, column_B])
  df1_new
}

df1 的结果:

name column_A column_B
TEXT333        1        11
b        2        b
c        3        c

这是使用 for 循环的一种方法。你很接近!请注意,我将您的参考数据框名称更改为 dfs 以避免与 list().

混淆

你认为你可能会遇到在同一个数据帧中匹配多次的情况吗?如果是这样,我在下面显示的内容如果没有更多行将无法工作。

df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
dfs <- list(df1, df2, df3)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)

# loop over all dataframes in your list
for(i in 1:length(dfs)){
  
  # get name that matches regex
  val <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
  
  # use name to update value from reference df
  dfs[[i]][dfs[[i]]$name == val,"column_A"] <- df[df$name == val,"column_B"]
}

更新后的答案可以说明同一 df 中的多个匹配项

for(i in 1:length(dfs)){
  vals <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
  for(val in vals){
    dfs[[i]][dfs[[i]]$name == val, "column_A"] <- df[df$name == val,"column_B"]
  }
}