多个数据帧的列之间的部分匹配字符串
Partical match string between columns for multiple dataframes
我有一个数据框列表(df1、df2、df3),我想将其列与另一个数据框 (df) 匹配,并且仅当匹配时才替换字符串。匹配应该基于运行函数时指定的字符串,指定为部分匹配,换句话说这里它只针对包含字符串“TEXT”的字段并且应该适用于TEXT123和TEXTabc等情况。我自己也没走多远...
df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)
list<-c(df1, df2, df3)
df1 示例
partial_match <- function(column_A$df1, column_B, TEXT, df) {
df1_new <-df1
df1_new[, column_B] <- ifelse(grepl("TEXT.*", df1[, column_A]),
df[, column_B] - nchar(TEXT),
df[, column_B])
df1_new
}
df1 的结果:
name column_A column_B
TEXT333 1 11
b 2 b
c 3 c
这是使用 for 循环的一种方法。你很接近!请注意,我将您的参考数据框名称更改为 dfs
以避免与 list()
.
混淆
你认为你可能会遇到在同一个数据帧中匹配多次的情况吗?如果是这样,我在下面显示的内容如果没有更多行将无法工作。
df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
dfs <- list(df1, df2, df3)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)
# loop over all dataframes in your list
for(i in 1:length(dfs)){
# get name that matches regex
val <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
# use name to update value from reference df
dfs[[i]][dfs[[i]]$name == val,"column_A"] <- df[df$name == val,"column_B"]
}
更新后的答案可以说明同一 df 中的多个匹配项
for(i in 1:length(dfs)){
vals <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
for(val in vals){
dfs[[i]][dfs[[i]]$name == val, "column_A"] <- df[df$name == val,"column_B"]
}
}
我有一个数据框列表(df1、df2、df3),我想将其列与另一个数据框 (df) 匹配,并且仅当匹配时才替换字符串。匹配应该基于运行函数时指定的字符串,指定为部分匹配,换句话说这里它只针对包含字符串“TEXT”的字段并且应该适用于TEXT123和TEXTabc等情况。我自己也没走多远...
df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)
list<-c(df1, df2, df3)
df1 示例
partial_match <- function(column_A$df1, column_B, TEXT, df) {
df1_new <-df1
df1_new[, column_B] <- ifelse(grepl("TEXT.*", df1[, column_A]),
df[, column_B] - nchar(TEXT),
df[, column_B])
df1_new
}
df1 的结果:
name column_A column_B
TEXT333 1 11
b 2 b
c 3 c
这是使用 for 循环的一种方法。你很接近!请注意,我将您的参考数据框名称更改为 dfs
以避免与 list()
.
你认为你可能会遇到在同一个数据帧中匹配多次的情况吗?如果是这样,我在下面显示的内容如果没有更多行将无法工作。
df1 <- data.frame(name = c("TEXT333","b","c"), column_A = 1:3, stringsAsFactors=FALSE)
df2 <- data.frame(name = c("b","TEXT345","d"), column_A = 4:6, stringsAsFactors=FALSE)
df3 <- data.frame(name = c("c","TEXT123","a"), column_A = 7:9, stringsAsFactors=FALSE)
dfs <- list(df1, df2, df3)
df <- data.frame(name = c("TEXT333","TEXT123","a", "TEXT345", "k", "l", "b","c", "f"), column_B = 11:19, stringsAsFactors=FALSE)
# loop over all dataframes in your list
for(i in 1:length(dfs)){
# get name that matches regex
val <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
# use name to update value from reference df
dfs[[i]][dfs[[i]]$name == val,"column_A"] <- df[df$name == val,"column_B"]
}
更新后的答案可以说明同一 df 中的多个匹配项
for(i in 1:length(dfs)){
vals <- grep(pattern = "*TEXT*", x = dfs[[i]]$name, value = TRUE)
for(val in vals){
dfs[[i]][dfs[[i]]$name == val, "column_A"] <- df[df$name == val,"column_B"]
}
}