R 中的模糊映射
Fuzzy mapping in R
我正在尝试使用 agrep 命令进行模糊匹配。我有一个数据框,其中一列包含观众响应,另一个数据框包含细分和子细分。列受众响应包含作为子分段名称的词。例如:
pattern$audience
[1] "(Deleted) Semasio » DE: Intent » Christmas Shopping"
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers"
[5] "(Old) AddThis - UK » Food » Social"
[6] "(Old) AddThis - UK » Health » Social » Health Influencers"
同样,我有另一个名为 x 的数据框,它包含段和子段
x$segment x$subsegment
Shopping Financial shoppers
Travel Travel Europe
Shopping Christmas shopping
我想编写一个函数,在 pattern$Audience 和 x$subsegment 之间进行模糊匹配,returns 新列中每个观众响应的子段作为 pattern$subseg
我需要的结果数据集应该是这样的:
pattern$audience x$segment x$subsegment
[1] "(Deleted) Semasio » DE: Intent » Christmas C" Shopping Christmas shopping
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers" Shopping Financial shoppers
[5] "(Old) AddThis - UK » Food » Social"
[6] "(Old) AddThis - UK » Health » Social » Health Influencers"
这是我尝试编写的代码,但它没有返回我想要的输出:
x <- rename(x, c("Segment" = "segment", "Sub Segment" = "subseg"))
names(x)
y <- as.data.frame(x$subseg)
y <- rename(y, c("x$subseg" = "subseg"))
n.match <- function(pattern, x, ...) {
for (i in 1:nrow(pattern)) {
x <- (agrep(y,pattern$audience[i],
ignore.case=TRUE, value = TRUE))
x <- paste0(x,"")
pattern$subseg[i] <- x
}
head(pattern)
}
谁能帮我改正我的错误。
非常感谢您的回答。
非常感谢
我们可以试试这个:
pattern <- c("(Deleted) Semasio » DE: Intent » Christmas C",
"(Old) AddThis - UK » Auto » General » Auto Enthusiasts",
"(Old) AddThis - UK » Auto » General » Auto Intenders",
"(Old) AddThis - UK » Financial » Social » Financial Shoppers",
"(Old) AddThis - UK » Food » Social",
"(Old) AddThis - UK » Financial » Social » Financial Shoppers",
"(Old) AddThis - UK » Health » Social » Health Influencers")
pattern <- data.frame(audiance=pattern)
x <- read.csv(text='segment, subsegment
Shopping, Financial shoppers
Travel, Travel Europe
Enthusiasts, Auto Enthusiasts
Shopping, Christmas shopping', stringsAsFactors=FALSE)
vagrep <- Vectorize(agrep, 'pattern', SIMPLIFY = TRUE)
pattern$subsegment <- ''
matches <- vagrep(x$subsegment, pattern$audiance)
invisible(lapply(1:length(matches), function(i) if (length(matches[[i]] > 0)) pattern$subsegment[matches[[i]]] <<- x$subsegment[i]))
pattern
# audiance subsegment
#1 (Deleted) Semasio » DE: Intent » Christmas C
#2 (Old) AddThis - UK » Auto » General » Auto Enthusiasts Auto Enthusiasts
#3 (Old) AddThis - UK » Auto » General » Auto Intenders
#4 (Old) AddThis - UK » Financial » Social » Financial Shoppers Financial shoppers
#5 (Old) AddThis - UK » Food » Social
#6 (Old) AddThis - UK » Financial » Social » Financial Shoppers Financial shoppers
#7 (Old) AddThis - UK » Health » Social » Health Influencers
我正在尝试使用 agrep 命令进行模糊匹配。我有一个数据框,其中一列包含观众响应,另一个数据框包含细分和子细分。列受众响应包含作为子分段名称的词。例如:
pattern$audience
[1] "(Deleted) Semasio » DE: Intent » Christmas Shopping"
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers"
[5] "(Old) AddThis - UK » Food » Social"
[6] "(Old) AddThis - UK » Health » Social » Health Influencers"
同样,我有另一个名为 x 的数据框,它包含段和子段
x$segment x$subsegment
Shopping Financial shoppers
Travel Travel Europe
Shopping Christmas shopping
我想编写一个函数,在 pattern$Audience 和 x$subsegment 之间进行模糊匹配,returns 新列中每个观众响应的子段作为 pattern$subseg
我需要的结果数据集应该是这样的:
pattern$audience x$segment x$subsegment
[1] "(Deleted) Semasio » DE: Intent » Christmas C" Shopping Christmas shopping
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers" Shopping Financial shoppers
[5] "(Old) AddThis - UK » Food » Social"
[6] "(Old) AddThis - UK » Health » Social » Health Influencers"
这是我尝试编写的代码,但它没有返回我想要的输出:
x <- rename(x, c("Segment" = "segment", "Sub Segment" = "subseg"))
names(x)
y <- as.data.frame(x$subseg)
y <- rename(y, c("x$subseg" = "subseg"))
n.match <- function(pattern, x, ...) {
for (i in 1:nrow(pattern)) {
x <- (agrep(y,pattern$audience[i],
ignore.case=TRUE, value = TRUE))
x <- paste0(x,"")
pattern$subseg[i] <- x
}
head(pattern)
}
谁能帮我改正我的错误。 非常感谢您的回答。 非常感谢
我们可以试试这个:
pattern <- c("(Deleted) Semasio » DE: Intent » Christmas C",
"(Old) AddThis - UK » Auto » General » Auto Enthusiasts",
"(Old) AddThis - UK » Auto » General » Auto Intenders",
"(Old) AddThis - UK » Financial » Social » Financial Shoppers",
"(Old) AddThis - UK » Food » Social",
"(Old) AddThis - UK » Financial » Social » Financial Shoppers",
"(Old) AddThis - UK » Health » Social » Health Influencers")
pattern <- data.frame(audiance=pattern)
x <- read.csv(text='segment, subsegment
Shopping, Financial shoppers
Travel, Travel Europe
Enthusiasts, Auto Enthusiasts
Shopping, Christmas shopping', stringsAsFactors=FALSE)
vagrep <- Vectorize(agrep, 'pattern', SIMPLIFY = TRUE)
pattern$subsegment <- ''
matches <- vagrep(x$subsegment, pattern$audiance)
invisible(lapply(1:length(matches), function(i) if (length(matches[[i]] > 0)) pattern$subsegment[matches[[i]]] <<- x$subsegment[i]))
pattern
# audiance subsegment
#1 (Deleted) Semasio » DE: Intent » Christmas C
#2 (Old) AddThis - UK » Auto » General » Auto Enthusiasts Auto Enthusiasts
#3 (Old) AddThis - UK » Auto » General » Auto Intenders
#4 (Old) AddThis - UK » Financial » Social » Financial Shoppers Financial shoppers
#5 (Old) AddThis - UK » Food » Social
#6 (Old) AddThis - UK » Financial » Social » Financial Shoppers Financial shoppers
#7 (Old) AddThis - UK » Health » Social » Health Influencers