R 在数据框列上应用用户定义函数
R apply user define function on data frame columns
在 R 中,我定义了一个函数来计算 2 个字符串之间的交集:
containedin <- function(t1,t2){
return length(Reduce(intersect, strsplit(c(t1,t2), "\s+")))
}
我想将此函数应用于包含 2 个字符串列的数据框:
data.selected[c('keywords','title')]
keywords title
1 Samsung UN48H6350 48" Samsung UN48H6350 48" Full 1080p Smart HDTV 120Hz with Wi-Fi + Visa Gift Card
2 Samsung UN48H6350 48" Samsung UN48H6350 48" Full HD Smart LED TV -Bundle- (See Below for Contents)
3 Samsung UN48H6350 48" Samsung UN48H6350 48" Class Full HD Smart LED TV -BUNDLE- See below Details
4 Samsung UN48H6350 48" Samsung UN48H6350 48" Full HD Smart LED TV With BD-H5100 Blu-ray Disc Player
5 Samsung UN48H6350 48" Samsung UN48H6350 48" Smart 1080p Clear Motion Rate 240 LED HDTV
6 Samsung UN48H6350 48" Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi
7 Samsung UN48H6350 48" Samsung 6350 Series UN48H6350 48" 1080p HD LED LCD Internet TV NEW
8 Samsung UN48H6350 48" Samsung Un48h6350af 75" 1080p Led-lcd Tv - 16:9 - Hdtv 1080p - (un75h6350afxza)
9 Samsung UN48H6350 48" Samsung UN48H6350 - 48" HD 1080p Smart HDTV 120Hz Bundle
10 Samsung UN48H6350 48" Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi, (R#416)
我如何使用 apply 函数应用到这两列,return 一个新的结果列?
首先,您的 return
语句确实应该给您一个错误。你可能是说
containedin <- function(t1,t2){
length(Reduce(intersect, strsplit(c(t1,t2), "\s+")))
}
无论如何,您可以使用mapply
来解决您的问题。
mapply(containedin,
as.character(data.selected[, 'keywords']),
as.character(data.selected[, 'title']))
仅当 class(data.selected[, 'keywords'])
为 factor
(而不是 character
)时才需要 as.character
在 R 中,我定义了一个函数来计算 2 个字符串之间的交集:
containedin <- function(t1,t2){
return length(Reduce(intersect, strsplit(c(t1,t2), "\s+")))
}
我想将此函数应用于包含 2 个字符串列的数据框: data.selected[c('keywords','title')]
keywords title
1 Samsung UN48H6350 48" Samsung UN48H6350 48" Full 1080p Smart HDTV 120Hz with Wi-Fi + Visa Gift Card
2 Samsung UN48H6350 48" Samsung UN48H6350 48" Full HD Smart LED TV -Bundle- (See Below for Contents)
3 Samsung UN48H6350 48" Samsung UN48H6350 48" Class Full HD Smart LED TV -BUNDLE- See below Details
4 Samsung UN48H6350 48" Samsung UN48H6350 48" Full HD Smart LED TV With BD-H5100 Blu-ray Disc Player
5 Samsung UN48H6350 48" Samsung UN48H6350 48" Smart 1080p Clear Motion Rate 240 LED HDTV
6 Samsung UN48H6350 48" Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi
7 Samsung UN48H6350 48" Samsung 6350 Series UN48H6350 48" 1080p HD LED LCD Internet TV NEW
8 Samsung UN48H6350 48" Samsung Un48h6350af 75" 1080p Led-lcd Tv - 16:9 - Hdtv 1080p - (un75h6350afxza)
9 Samsung UN48H6350 48" Samsung UN48H6350 - 48" HD 1080p Smart HDTV 120Hz Bundle
10 Samsung UN48H6350 48" Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi, (R#416)
我如何使用 apply 函数应用到这两列,return 一个新的结果列?
首先,您的 return
语句确实应该给您一个错误。你可能是说
containedin <- function(t1,t2){
length(Reduce(intersect, strsplit(c(t1,t2), "\s+")))
}
无论如何,您可以使用mapply
来解决您的问题。
mapply(containedin,
as.character(data.selected[, 'keywords']),
as.character(data.selected[, 'title']))
仅当 class(data.selected[, 'keywords'])
为 factor
(而不是 character
)时才需要 as.character