指定试验次数,bootstrap
Specifying number of trials, bootstrap
对于一项作业,我正在使用 R 上的 mixtools 包应用混合建模。当我尝试使用 bootstrap 找出最佳组件数量时。我收到以下错误
Error in boot.comp(y, x, N = NULL, max.comp = 2, B = 5, sig = 0.05, arbmean = TRUE, :
Number of trials must be specified!
我发现我必须填写 N:逻辑回归类型 logisregmix 的试验次数的 n 向量。如果
NULL,那么N是n个1s的向量,用于二元逻辑回归。
但是,我不知道如何找出 N 实际上是什么来让我的 bootstrap 工作。
Link 到我的代码:
https://www.kaggle.com/blastchar/telco-customer-churn
我的代码:
data <- read.csv("Desktop/WA_Fn-UseC_-Telco-Customer-Churn.csv", stringsAsFactors = FALSE,
na.strings = c("NA", "N/A", "Unknown*", "NULL", ".P"))
data <- droplevels(na.omit(data))
data <- data[c(1:5032),]
testdf <- data[c(5033:7032),]
data <- subset(data, select = -customerID)
set.seed(100)
library(plyr)
library(mixtools)
data$Churn <- revalue(data$Churn, c("Yes"=1, "No"=0))
y <- as.numeric(data$Churn)
x <- model.matrix(Churn ~ . , data = data)
x <- x[, -1] #remove intercept
x <-x[,-c(7, 11, 13, 15, 17, 19, 21)] #multicollinearity
a <- boot.comp(y, x, N = NULL, max.comp = 2, B = 100,
sig = 0.05, arbmean = TRUE, arbvar = TRUE,
mix.type = "logisregmix", hist = TRUE)
下面是关于我的预测器的更多信息:
dput(x[1:4,])
structure(c(0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
34, 2, 45, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 1, 0, 29.85, 56.95, 53.85, 42.3, 29.85, 1889.5, 108.15,
1840.75), .Dim = c(4L, 23L), .Dimnames = list(c("1", "2", "3",
"4"), c("genderMale", "SeniorCitizen", "PartnerYes", "DependentsYes",
"tenure", "PhoneServiceYes", "MultipleLinesYes", "InternetServiceFiber optic",
"InternetServiceNo", "OnlineSecurityYes", "OnlineBackupYes",
"DeviceProtectionYes", "TechSupportYes", "StreamingTVYes", "StreamingMoviesYes",
"ContractOne year", "ContractTwo year", "PaperlessBillingYes",
"PaymentMethodCredit card (automatic)", "PaymentMethodElectronic check",
"PaymentMethodMailed check", "MonthlyCharges", "TotalCharges"
)))
我的响应变量是二进制的
希望大家帮帮我!
查看 mixtools::boot.comp
的源代码,它很可怕,因为它超过 800 行并且非常需要重构,有问题的行是:
if (mix.type == "logisregmix") {
if (is.null(N))
stop("Number of trials must be specified!")
尽管文档怎么说,但必须指定 N
。
尝试将其设置为1s的向量:N = rep(1, length(y))
或N = rep(1, nrow(x))
事实上,如果您查看 mixtools::logisregmixEM
,boot.comp
调用的内部函数,您将看到 N
是如何设置的,如果 NULL
:
n <- length(y)
if (is.null(N)) {
N = rep(1, n)
}
太糟糕了,如果 N
是 NULL
,这永远不会达到,因为它之前因错误而停止。这是一个错误。
对于一项作业,我正在使用 R 上的 mixtools 包应用混合建模。当我尝试使用 bootstrap 找出最佳组件数量时。我收到以下错误
Error in boot.comp(y, x, N = NULL, max.comp = 2, B = 5, sig = 0.05, arbmean = TRUE, :
Number of trials must be specified!
我发现我必须填写 N:逻辑回归类型 logisregmix 的试验次数的 n 向量。如果 NULL,那么N是n个1s的向量,用于二元逻辑回归。
但是,我不知道如何找出 N 实际上是什么来让我的 bootstrap 工作。
Link 到我的代码: https://www.kaggle.com/blastchar/telco-customer-churn
我的代码:
data <- read.csv("Desktop/WA_Fn-UseC_-Telco-Customer-Churn.csv", stringsAsFactors = FALSE,
na.strings = c("NA", "N/A", "Unknown*", "NULL", ".P"))
data <- droplevels(na.omit(data))
data <- data[c(1:5032),]
testdf <- data[c(5033:7032),]
data <- subset(data, select = -customerID)
set.seed(100)
library(plyr)
library(mixtools)
data$Churn <- revalue(data$Churn, c("Yes"=1, "No"=0))
y <- as.numeric(data$Churn)
x <- model.matrix(Churn ~ . , data = data)
x <- x[, -1] #remove intercept
x <-x[,-c(7, 11, 13, 15, 17, 19, 21)] #multicollinearity
a <- boot.comp(y, x, N = NULL, max.comp = 2, B = 100,
sig = 0.05, arbmean = TRUE, arbvar = TRUE,
mix.type = "logisregmix", hist = TRUE)
下面是关于我的预测器的更多信息:
dput(x[1:4,]) structure(c(0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 34, 2, 45, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 29.85, 56.95, 53.85, 42.3, 29.85, 1889.5, 108.15, 1840.75), .Dim = c(4L, 23L), .Dimnames = list(c("1", "2", "3", "4"), c("genderMale", "SeniorCitizen", "PartnerYes", "DependentsYes", "tenure", "PhoneServiceYes", "MultipleLinesYes", "InternetServiceFiber optic", "InternetServiceNo", "OnlineSecurityYes", "OnlineBackupYes", "DeviceProtectionYes", "TechSupportYes", "StreamingTVYes", "StreamingMoviesYes", "ContractOne year", "ContractTwo year", "PaperlessBillingYes", "PaymentMethodCredit card (automatic)", "PaymentMethodElectronic check", "PaymentMethodMailed check", "MonthlyCharges", "TotalCharges" )))
我的响应变量是二进制的
希望大家帮帮我!
查看 mixtools::boot.comp
的源代码,它很可怕,因为它超过 800 行并且非常需要重构,有问题的行是:
if (mix.type == "logisregmix") {
if (is.null(N))
stop("Number of trials must be specified!")
尽管文档怎么说,但必须指定 N
。
尝试将其设置为1s的向量:N = rep(1, length(y))
或N = rep(1, nrow(x))
事实上,如果您查看 mixtools::logisregmixEM
,boot.comp
调用的内部函数,您将看到 N
是如何设置的,如果 NULL
:
n <- length(y)
if (is.null(N)) {
N = rep(1, n)
}
太糟糕了,如果 N
是 NULL
,这永远不会达到,因为它之前因错误而停止。这是一个错误。