插入符号中自定义度量函数的每个 CV 折叠的访问索引
Access indices of each CV fold for custom metric function in caret
我想在 caret
中定义我的自定义指标函数,但在这个函数中我想使用不用于训练的附加信息。
因此,我需要获得此折叠中用于验证的数据的索引(行号)。
这是一个愚蠢的例子:
生成数据:
library(caret)
set.seed(1234)
x <- matrix(rnorm(10),nrow=5,ncol=2 )
y <- factor(c("y","n","y","y","n"))
priors <- c(1,3,2,7,9)
这是我的示例度量函数,它应该使用来自 priors
向量的信息
my.metric <- function (data,
lev = NULL,
model = NULL) {
out <- priors[-->INDICES.OF.DATA<--] + data$pred/data$obs
names(out) <- "MYMEASURE"
out
}
myControl <- trainControl(summaryFunction = my.metricm, method="repeatedcv", number=10, repeats=2)
fit <- train(y=y,x=x, metric = "MYMEASURE",method="gbm", trControl = mControl)
为了使这一点更清楚,我可以在 priors
天的生存环境中使用它,并在 Surv
对象中使用它来测量度量函数中的生存 AUC。
如何在插入符号中执行此操作?
您可以使用 data$rowIndex
访问行号。请注意,汇总函数应 return 单个数字作为其指标(例如 ROC、准确度、RMSE...)。上面的函数似乎 return 一个长度等于保留的 CV 数据中观察值数量的向量。
如果您有兴趣查看重采样及其预测,可以将 print(data)
添加到 my.metric
函数。
这是一个使用您的数据(放大了一点)和 Metrics::auc
作为性能度量的示例,在将预测的 class 概率与先验概率相乘后:
library(caret)
library(Metrics)
set.seed(1234)
x <- matrix(rnorm(100), nrow=100, ncol=2 )
set.seed(1234)
y <- factor(sample(x = c("y", "n"), size = 100, replace = T))
priors <- runif(n = length(y), min = 0.1, max = 0.9)
my.metric <- function(data, lev = NULL, model = NULL)
{
# The performance metric should be a single number
# data$y are the predicted probabilities of
# the observations in the fold belonging to class "y"
out <- Metrics::auc(actual = as.numeric(data$obs == "y"),
predicted = priors[data$rowIndex] * data$y)
names(out) <- "MYMEASURE"
out
}
fitControl <- trainControl(method = "repeatedcv",
number = 10,
classProbs = T,
repeats = 2,
summaryFunction = my.metric)
set.seed(1234)
fit <- train(y = y,
x = x,
metric = "MYMEASURE",
method="gbm",
verbose = FALSE,
trControl = fitControl)
fit
# Stochastic Gradient Boosting
#
# 100 samples
# 2 predictor
# 2 classes: 'n', 'y'
#
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 2 times)
#
# Summary of sample sizes: 90, 90, 90, 90, 90, 89, ...
#
# Resampling results across tuning parameters:
#
# interaction.depth n.trees MYMEASURE MYMEASURE SD
# 1 50 0.5551667 0.2348496
# 1 100 0.5682500 0.2297383
# 1 150 0.5797500 0.2274042
# 2 50 0.5789167 0.2246845
# 2 100 0.5941667 0.2053826
# 2 150 0.5900833 0.2186712
# 3 50 0.5750833 0.2291999
# 3 100 0.5488333 0.2312470
# 3 150 0.5577500 0.2202638
#
# Tuning parameter 'shrinkage' was held constant at a value of 0.1
# Tuning parameter 'n.minobsinnode' was held constant at a value of 10
# MYMEASURE was used to select the optimal model using the largest value.
我不太了解生存分析,但我希望这对您有所帮助。
我想在 caret
中定义我的自定义指标函数,但在这个函数中我想使用不用于训练的附加信息。
因此,我需要获得此折叠中用于验证的数据的索引(行号)。
这是一个愚蠢的例子:
生成数据:
library(caret)
set.seed(1234)
x <- matrix(rnorm(10),nrow=5,ncol=2 )
y <- factor(c("y","n","y","y","n"))
priors <- c(1,3,2,7,9)
这是我的示例度量函数,它应该使用来自 priors
向量的信息
my.metric <- function (data,
lev = NULL,
model = NULL) {
out <- priors[-->INDICES.OF.DATA<--] + data$pred/data$obs
names(out) <- "MYMEASURE"
out
}
myControl <- trainControl(summaryFunction = my.metricm, method="repeatedcv", number=10, repeats=2)
fit <- train(y=y,x=x, metric = "MYMEASURE",method="gbm", trControl = mControl)
为了使这一点更清楚,我可以在 priors
天的生存环境中使用它,并在 Surv
对象中使用它来测量度量函数中的生存 AUC。
如何在插入符号中执行此操作?
您可以使用 data$rowIndex
访问行号。请注意,汇总函数应 return 单个数字作为其指标(例如 ROC、准确度、RMSE...)。上面的函数似乎 return 一个长度等于保留的 CV 数据中观察值数量的向量。
如果您有兴趣查看重采样及其预测,可以将 print(data)
添加到 my.metric
函数。
这是一个使用您的数据(放大了一点)和 Metrics::auc
作为性能度量的示例,在将预测的 class 概率与先验概率相乘后:
library(caret)
library(Metrics)
set.seed(1234)
x <- matrix(rnorm(100), nrow=100, ncol=2 )
set.seed(1234)
y <- factor(sample(x = c("y", "n"), size = 100, replace = T))
priors <- runif(n = length(y), min = 0.1, max = 0.9)
my.metric <- function(data, lev = NULL, model = NULL)
{
# The performance metric should be a single number
# data$y are the predicted probabilities of
# the observations in the fold belonging to class "y"
out <- Metrics::auc(actual = as.numeric(data$obs == "y"),
predicted = priors[data$rowIndex] * data$y)
names(out) <- "MYMEASURE"
out
}
fitControl <- trainControl(method = "repeatedcv",
number = 10,
classProbs = T,
repeats = 2,
summaryFunction = my.metric)
set.seed(1234)
fit <- train(y = y,
x = x,
metric = "MYMEASURE",
method="gbm",
verbose = FALSE,
trControl = fitControl)
fit
# Stochastic Gradient Boosting
#
# 100 samples
# 2 predictor
# 2 classes: 'n', 'y'
#
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 2 times)
#
# Summary of sample sizes: 90, 90, 90, 90, 90, 89, ...
#
# Resampling results across tuning parameters:
#
# interaction.depth n.trees MYMEASURE MYMEASURE SD
# 1 50 0.5551667 0.2348496
# 1 100 0.5682500 0.2297383
# 1 150 0.5797500 0.2274042
# 2 50 0.5789167 0.2246845
# 2 100 0.5941667 0.2053826
# 2 150 0.5900833 0.2186712
# 3 50 0.5750833 0.2291999
# 3 100 0.5488333 0.2312470
# 3 150 0.5577500 0.2202638
#
# Tuning parameter 'shrinkage' was held constant at a value of 0.1
# Tuning parameter 'n.minobsinnode' was held constant at a value of 10
# MYMEASURE was used to select the optimal model using the largest value.
我不太了解生存分析,但我希望这对您有所帮助。