决策树的“train”函数如何选择最佳曲调?
How does function `train` for decision tree work in selecting the Best Tune?
为了估计所谓的复杂度参数cp
(修剪树很方便),我使用了网格搜索和另一个包caret
的函数。这是代码:
dataset = dataset[3:5]
dataset$Purchased = factor(dataset$Purchased, levels = c(0, 1))
library(caTools)
set.seed(123)
split = sample.split(dataset$Purchased, SplitRatio = 0.75)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
training_set[-3] = scale(training_set[-3])
test_set[-3] = scale(test_set[-3])
library(rpart)
library(caret)
youdenSumary <- function(data, lev = NULL, model = NULL){
if (length(lev) > 2) {
stop(paste("Your outcome has", length(lev), "levels. The joudenSumary() function isn't appropriate."))
}
if (!all(levels(data[, "pred"]) == lev)) {
stop("levels of observed and predicted data do not match")
}
Sens <- caret::sensitivity(data[, "pred"], data[, "obs"], lev[1])
Spec <- caret::specificity(data[, "pred"], data[, "obs"], lev[2])
j <- (Sens + Spec)/2
out <- c(j, Spec, Sens)
names(out) <- c("j", "Spec", "Sens")
out
}
library(caret)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 20,
search = "grid",summaryFunction = youdenSumary)
classifier = train(form = Purchased ~ ., data = training_set, method = 'rpart',
trControl=trctrl,tuneLength = 10,metric = "j")
classifier
此代码产生以下结果:
cp j Spec Sens
0.00000000 0.9133409 0.9141818 0.9125000
0.06126687 0.9205592 0.9345000 0.9066184
0.12253375 0.9205592 0.9345000 0.9066184
0.18380062 0.9205592 0.9345000 0.9066184
0.24506750 0.8009091 0.6773182 0.9245000
0.30633437 0.7884659 0.6396818 0.9372500
0.36760125 0.7884659 0.6396818 0.9372500
0.42886812 0.7884659 0.6396818 0.9372500
0.49013499 0.7884659 0.6396818 0.9372500
0.55140187 0.6588846 0.3660455 0.9517237
j was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.1838006.
我想知道算法如何为 cp
选择候选值。我阅读了 cp
的 10 个可能值;该算法在其中选择最大化 j
的值,这对我来说很清楚。但是要分配给 cp
的可能值从何而来?
数据集如下:
> dput(dataset)
structure(list(Age = c(19L, 35L, 26L, 27L, 19L, 27L, 27L, 32L,
25L, 35L, 26L, 26L, 20L, 32L, 18L, 29L, 47L, 45L, 46L, 48L, 45L,
47L, 48L, 45L, 46L, 47L, 49L, 47L, 29L, 31L, 31L, 27L, 21L, 28L,
27L, 35L, 33L, 30L, 26L, 27L, 27L, 33L, 35L, 30L, 28L, 23L, 25L,
27L, 30L, 31L, 24L, 18L, 29L, 35L, 27L, 24L, 23L, 28L, 22L, 32L,
27L, 25L, 23L, 32L, 59L, 24L, 24L, 23L, 22L, 31L, 25L, 24L, 20L,
33L, 32L, 34L, 18L, 22L, 28L, 26L, 30L, 39L, 20L, 35L, 30L, 31L,
24L, 28L, 26L, 35L, 22L, 30L, 26L, 29L, 29L, 35L, 35L, 28L, 35L,
28L, 27L, 28L, 32L, 33L, 19L, 21L, 26L, 27L, 26L, 38L, 39L, 37L,
38L, 37L, 42L, 40L, 35L, 36L, 40L, 41L, 36L, 37L, 40L, 35L, 41L,
39L, 42L, 26L, 30L, 26L, 31L, 33L, 30L, 21L, 28L, 23L, 20L, 30L,
28L, 19L, 19L, 18L, 35L, 30L, 34L, 24L, 27L, 41L, 29L, 20L, 26L,
41L, 31L, 36L, 40L, 31L, 46L, 29L, 26L, 32L, 32L, 25L, 37L, 35L,
33L, 18L, 22L, 35L, 29L, 29L, 21L, 34L, 26L, 34L, 34L, 23L, 35L,
25L, 24L, 31L, 26L, 31L, 32L, 33L, 33L, 31L, 20L, 33L, 35L, 28L,
24L, 19L, 29L, 19L, 28L, 34L, 30L, 20L, 26L, 35L, 35L, 49L, 39L,
41L, 58L, 47L, 55L, 52L, 40L, 46L, 48L, 52L, 59L, 35L, 47L, 60L,
49L, 40L, 46L, 59L, 41L, 35L, 37L, 60L, 35L, 37L, 36L, 56L, 40L,
42L, 35L, 39L, 40L, 49L, 38L, 46L, 40L, 37L, 46L, 53L, 42L, 38L,
50L, 56L, 41L, 51L, 35L, 57L, 41L, 35L, 44L, 37L, 48L, 37L, 50L,
52L, 41L, 40L, 58L, 45L, 35L, 36L, 55L, 35L, 48L, 42L, 40L, 37L,
47L, 40L, 43L, 59L, 60L, 39L, 57L, 57L, 38L, 49L, 52L, 50L, 59L,
35L, 37L, 52L, 48L, 37L, 37L, 48L, 41L, 37L, 39L, 49L, 55L, 37L,
35L, 36L, 42L, 43L, 45L, 46L, 58L, 48L, 37L, 37L, 40L, 42L, 51L,
47L, 36L, 38L, 42L, 39L, 38L, 49L, 39L, 39L, 54L, 35L, 45L, 36L,
52L, 53L, 41L, 48L, 48L, 41L, 41L, 42L, 36L, 47L, 38L, 48L, 42L,
40L, 57L, 36L, 58L, 35L, 38L, 39L, 53L, 35L, 38L, 47L, 47L, 41L,
53L, 54L, 39L, 38L, 38L, 37L, 42L, 37L, 36L, 60L, 54L, 41L, 40L,
42L, 43L, 53L, 47L, 42L, 42L, 59L, 58L, 46L, 38L, 54L, 60L, 60L,
39L, 59L, 37L, 46L, 46L, 42L, 41L, 58L, 42L, 48L, 44L, 49L, 57L,
56L, 49L, 39L, 47L, 48L, 48L, 47L, 45L, 60L, 39L, 46L, 51L, 50L,
36L, 49L), EstimatedSalary = c(19000L, 20000L, 43000L, 57000L,
76000L, 58000L, 84000L, 150000L, 33000L, 65000L, 80000L, 52000L,
86000L, 18000L, 82000L, 80000L, 25000L, 26000L, 28000L, 29000L,
22000L, 49000L, 41000L, 22000L, 23000L, 20000L, 28000L, 30000L,
43000L, 18000L, 74000L, 137000L, 16000L, 44000L, 90000L, 27000L,
28000L, 49000L, 72000L, 31000L, 17000L, 51000L, 108000L, 15000L,
84000L, 20000L, 79000L, 54000L, 135000L, 89000L, 32000L, 44000L,
83000L, 23000L, 58000L, 55000L, 48000L, 79000L, 18000L, 117000L,
20000L, 87000L, 66000L, 120000L, 83000L, 58000L, 19000L, 82000L,
63000L, 68000L, 80000L, 27000L, 23000L, 113000L, 18000L, 112000L,
52000L, 27000L, 87000L, 17000L, 80000L, 42000L, 49000L, 88000L,
62000L, 118000L, 55000L, 85000L, 81000L, 50000L, 81000L, 116000L,
15000L, 28000L, 83000L, 44000L, 25000L, 123000L, 73000L, 37000L,
88000L, 59000L, 86000L, 149000L, 21000L, 72000L, 35000L, 89000L,
86000L, 80000L, 71000L, 71000L, 61000L, 55000L, 80000L, 57000L,
75000L, 52000L, 59000L, 59000L, 75000L, 72000L, 75000L, 53000L,
51000L, 61000L, 65000L, 32000L, 17000L, 84000L, 58000L, 31000L,
87000L, 68000L, 55000L, 63000L, 82000L, 107000L, 59000L, 25000L,
85000L, 68000L, 59000L, 89000L, 25000L, 89000L, 96000L, 30000L,
61000L, 74000L, 15000L, 45000L, 76000L, 50000L, 47000L, 15000L,
59000L, 75000L, 30000L, 135000L, 100000L, 90000L, 33000L, 38000L,
69000L, 86000L, 55000L, 71000L, 148000L, 47000L, 88000L, 115000L,
118000L, 43000L, 72000L, 28000L, 47000L, 22000L, 23000L, 34000L,
16000L, 71000L, 117000L, 43000L, 60000L, 66000L, 82000L, 41000L,
72000L, 32000L, 84000L, 26000L, 43000L, 70000L, 89000L, 43000L,
79000L, 36000L, 80000L, 22000L, 39000L, 74000L, 134000L, 71000L,
101000L, 47000L, 130000L, 114000L, 142000L, 22000L, 96000L, 150000L,
42000L, 58000L, 43000L, 108000L, 65000L, 78000L, 96000L, 143000L,
80000L, 91000L, 144000L, 102000L, 60000L, 53000L, 126000L, 133000L,
72000L, 80000L, 147000L, 42000L, 107000L, 86000L, 112000L, 79000L,
57000L, 80000L, 82000L, 143000L, 149000L, 59000L, 88000L, 104000L,
72000L, 146000L, 50000L, 122000L, 52000L, 97000L, 39000L, 52000L,
134000L, 146000L, 44000L, 90000L, 72000L, 57000L, 95000L, 131000L,
77000L, 144000L, 125000L, 72000L, 90000L, 108000L, 75000L, 74000L,
144000L, 61000L, 133000L, 76000L, 42000L, 106000L, 26000L, 74000L,
71000L, 88000L, 38000L, 36000L, 88000L, 61000L, 70000L, 21000L,
141000L, 93000L, 62000L, 138000L, 79000L, 78000L, 134000L, 89000L,
39000L, 77000L, 57000L, 63000L, 73000L, 112000L, 79000L, 117000L,
38000L, 74000L, 137000L, 79000L, 60000L, 54000L, 134000L, 113000L,
125000L, 50000L, 70000L, 96000L, 50000L, 141000L, 79000L, 75000L,
104000L, 55000L, 32000L, 60000L, 138000L, 82000L, 52000L, 30000L,
131000L, 60000L, 72000L, 75000L, 118000L, 107000L, 51000L, 119000L,
65000L, 65000L, 60000L, 54000L, 144000L, 79000L, 55000L, 122000L,
104000L, 75000L, 65000L, 51000L, 105000L, 63000L, 72000L, 108000L,
77000L, 61000L, 113000L, 75000L, 90000L, 57000L, 99000L, 34000L,
70000L, 72000L, 71000L, 54000L, 129000L, 34000L, 50000L, 79000L,
104000L, 29000L, 47000L, 88000L, 71000L, 26000L, 46000L, 83000L,
73000L, 130000L, 80000L, 32000L, 74000L, 53000L, 87000L, 23000L,
64000L, 33000L, 139000L, 28000L, 33000L, 60000L, 39000L, 71000L,
34000L, 35000L, 33000L, 23000L, 45000L, 42000L, 59000L, 41000L,
23000L, 20000L, 33000L, 36000L), Purchased = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L,
1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L,
1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L), .Label = c("0",
"1"), class = "factor")), row.names = c(NA, -400L), class = "data.frame")
> dataset = read.csv('Social_Network_Ads.csv')
> dput(dataset)
structure(list(User.ID = c(15624510L, 15810944L, 15668575L, 15603246L,
15804002L, 15728773L, 15598044L, 15694829L, 15600575L, 15727311L,
15570769L, 15606274L, 15746139L, 15704987L, 15628972L, 15697686L,
15733883L, 15617482L, 15704583L, 15621083L, 15649487L, 15736760L,
15714658L, 15599081L, 15705113L, 15631159L, 15792818L, 15633531L,
15744529L, 15669656L, 15581198L, 15729054L, 15573452L, 15776733L,
15724858L, 15713144L, 15690188L, 15689425L, 15671766L, 15782806L,
15764419L, 15591915L, 15772798L, 15792008L, 15715541L, 15639277L,
15798850L, 15776348L, 15727696L, 15793813L, 15694395L, 15764195L,
15744919L, 15671655L, 15654901L, 15649136L, 15775562L, 15807481L,
15642885L, 15789109L, 15814004L, 15673619L, 15595135L, 15583681L,
15605000L, 15718071L, 15679760L, 15654574L, 15577178L, 15595324L,
15756932L, 15726358L, 15595228L, 15782530L, 15592877L, 15651983L,
15746737L, 15774179L, 15667265L, 15655123L, 15595917L, 15668385L,
15709476L, 15711218L, 15798659L, 15663939L, 15694946L, 15631912L,
15768816L, 15682268L, 15684801L, 15636428L, 15809823L, 15699284L,
15786993L, 15709441L, 15710257L, 15582492L, 15575694L, 15756820L,
15766289L, 15593014L, 15584545L, 15675949L, 15672091L, 15801658L,
15706185L, 15789863L, 15720943L, 15697997L, 15665416L, 15660200L,
15619653L, 15773447L, 15739160L, 15689237L, 15679297L, 15591433L,
15642725L, 15701962L, 15811613L, 15741049L, 15724423L, 15574305L,
15678168L, 15697020L, 15610801L, 15745232L, 15722758L, 15792102L,
15675185L, 15801247L, 15725660L, 15638963L, 15800061L, 15578006L,
15668504L, 15687491L, 15610403L, 15741094L, 15807909L, 15666141L,
15617134L, 15783029L, 15622833L, 15746422L, 15750839L, 15749130L,
15779862L, 15767871L, 15679651L, 15576219L, 15699247L, 15619087L,
15605327L, 15610140L, 15791174L, 15602373L, 15762605L, 15598840L,
15744279L, 15670619L, 15599533L, 15757837L, 15697574L, 15578738L,
15762228L, 15614827L, 15789815L, 15579781L, 15587013L, 15570932L,
15794661L, 15581654L, 15644296L, 15614420L, 15609653L, 15594577L,
15584114L, 15673367L, 15685576L, 15774727L, 15694288L, 15603319L,
15759066L, 15814816L, 15724402L, 15571059L, 15674206L, 15715160L,
15730448L, 15662067L, 15779581L, 15662901L, 15689751L, 15667742L,
15738448L, 15680243L, 15745083L, 15708228L, 15628523L, 15708196L,
15735549L, 15809347L, 15660866L, 15766609L, 15654230L, 15794566L,
15800890L, 15697424L, 15724536L, 15735878L, 15707596L, 15657163L,
15622478L, 15779529L, 15636023L, 15582066L, 15666675L, 15732987L,
15789432L, 15663161L, 15694879L, 15593715L, 15575002L, 15622171L,
15795224L, 15685346L, 15691808L, 15721007L, 15794253L, 15694453L,
15813113L, 15614187L, 15619407L, 15646227L, 15660541L, 15753874L,
15617877L, 15772073L, 15701537L, 15736228L, 15780572L, 15769596L,
15586996L, 15722061L, 15638003L, 15775590L, 15730688L, 15753102L,
15810075L, 15723373L, 15795298L, 15584320L, 15724161L, 15750056L,
15609637L, 15794493L, 15569641L, 15815236L, 15811177L, 15680587L,
15672821L, 15767681L, 15600379L, 15801336L, 15721592L, 15581282L,
15746203L, 15583137L, 15680752L, 15688172L, 15791373L, 15589449L,
15692819L, 15727467L, 15734312L, 15764604L, 15613014L, 15759684L,
15609669L, 15685536L, 15750447L, 15663249L, 15638646L, 15734161L,
15631070L, 15761950L, 15649668L, 15713912L, 15586757L, 15596522L,
15625395L, 15760570L, 15566689L, 15725794L, 15673539L, 15705298L,
15675791L, 15747043L, 15736397L, 15678201L, 15720745L, 15637593L,
15598070L, 15787550L, 15603942L, 15733973L, 15596761L, 15652400L,
15717893L, 15622585L, 15733964L, 15753861L, 15747097L, 15594762L,
15667417L, 15684861L, 15742204L, 15623502L, 15774872L, 15611191L,
15674331L, 15619465L, 15575247L, 15695679L, 15713463L, 15785170L,
15796351L, 15639576L, 15693264L, 15589715L, 15769902L, 15587177L,
15814553L, 15601550L, 15664907L, 15612465L, 15810800L, 15665760L,
15588080L, 15776844L, 15717560L, 15629739L, 15729908L, 15716781L,
15646936L, 15768151L, 15579212L, 15721835L, 15800515L, 15591279L,
15587419L, 15750335L, 15699619L, 15606472L, 15778368L, 15671387L,
15573926L, 15709183L, 15577514L, 15778830L, 15768072L, 15768293L,
15654456L, 15807525L, 15574372L, 15671249L, 15779744L, 15624755L,
15611430L, 15774744L, 15629885L, 15708791L, 15793890L, 15646091L,
15596984L, 15800215L, 15577806L, 15749381L, 15683758L, 15670615L,
15715622L, 15707634L, 15806901L, 15775335L, 15724150L, 15627220L,
15672330L, 15668521L, 15807837L, 15592570L, 15748589L, 15635893L,
15757632L, 15691863L, 15706071L, 15654296L, 15755018L, 15594041L
), Gender = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L,
2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L,
1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,
1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L,
1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L,
1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L,
1L, 1L, 2L, 1L, 2L, 1L), .Label = c("Female", "Male"), class = "factor"),
Age = c(19L, 35L, 26L, 27L, 19L, 27L, 27L, 32L, 25L, 35L,
26L, 26L, 20L, 32L, 18L, 29L, 47L, 45L, 46L, 48L, 45L, 47L,
48L, 45L, 46L, 47L, 49L, 47L, 29L, 31L, 31L, 27L, 21L, 28L,
27L, 35L, 33L, 30L, 26L, 27L, 27L, 33L, 35L, 30L, 28L, 23L,
25L, 27L, 30L, 31L, 24L, 18L, 29L, 35L, 27L, 24L, 23L, 28L,
22L, 32L, 27L, 25L, 23L, 32L, 59L, 24L, 24L, 23L, 22L, 31L,
25L, 24L, 20L, 33L, 32L, 34L, 18L, 22L, 28L, 26L, 30L, 39L,
20L, 35L, 30L, 31L, 24L, 28L, 26L, 35L, 22L, 30L, 26L, 29L,
29L, 35L, 35L, 28L, 35L, 28L, 27L, 28L, 32L, 33L, 19L, 21L,
26L, 27L, 26L, 38L, 39L, 37L, 38L, 37L, 42L, 40L, 35L, 36L,
40L, 41L, 36L, 37L, 40L, 35L, 41L, 39L, 42L, 26L, 30L, 26L,
31L, 33L, 30L, 21L, 28L, 23L, 20L, 30L, 28L, 19L, 19L, 18L,
35L, 30L, 34L, 24L, 27L, 41L, 29L, 20L, 26L, 41L, 31L, 36L,
40L, 31L, 46L, 29L, 26L, 32L, 32L, 25L, 37L, 35L, 33L, 18L,
22L, 35L, 29L, 29L, 21L, 34L, 26L, 34L, 34L, 23L, 35L, 25L,
24L, 31L, 26L, 31L, 32L, 33L, 33L, 31L, 20L, 33L, 35L, 28L,
24L, 19L, 29L, 19L, 28L, 34L, 30L, 20L, 26L, 35L, 35L, 49L,
39L, 41L, 58L, 47L, 55L, 52L, 40L, 46L, 48L, 52L, 59L, 35L,
47L, 60L, 49L, 40L, 46L, 59L, 41L, 35L, 37L, 60L, 35L, 37L,
36L, 56L, 40L, 42L, 35L, 39L, 40L, 49L, 38L, 46L, 40L, 37L,
46L, 53L, 42L, 38L, 50L, 56L, 41L, 51L, 35L, 57L, 41L, 35L,
44L, 37L, 48L, 37L, 50L, 52L, 41L, 40L, 58L, 45L, 35L, 36L,
55L, 35L, 48L, 42L, 40L, 37L, 47L, 40L, 43L, 59L, 60L, 39L,
57L, 57L, 38L, 49L, 52L, 50L, 59L, 35L, 37L, 52L, 48L, 37L,
37L, 48L, 41L, 37L, 39L, 49L, 55L, 37L, 35L, 36L, 42L, 43L,
45L, 46L, 58L, 48L, 37L, 37L, 40L, 42L, 51L, 47L, 36L, 38L,
42L, 39L, 38L, 49L, 39L, 39L, 54L, 35L, 45L, 36L, 52L, 53L,
41L, 48L, 48L, 41L, 41L, 42L, 36L, 47L, 38L, 48L, 42L, 40L,
57L, 36L, 58L, 35L, 38L, 39L, 53L, 35L, 38L, 47L, 47L, 41L,
53L, 54L, 39L, 38L, 38L, 37L, 42L, 37L, 36L, 60L, 54L, 41L,
40L, 42L, 43L, 53L, 47L, 42L, 42L, 59L, 58L, 46L, 38L, 54L,
60L, 60L, 39L, 59L, 37L, 46L, 46L, 42L, 41L, 58L, 42L, 48L,
44L, 49L, 57L, 56L, 49L, 39L, 47L, 48L, 48L, 47L, 45L, 60L,
39L, 46L, 51L, 50L, 36L, 49L), EstimatedSalary = c(19000L,
20000L, 43000L, 57000L, 76000L, 58000L, 84000L, 150000L,
33000L, 65000L, 80000L, 52000L, 86000L, 18000L, 82000L, 80000L,
25000L, 26000L, 28000L, 29000L, 22000L, 49000L, 41000L, 22000L,
23000L, 20000L, 28000L, 30000L, 43000L, 18000L, 74000L, 137000L,
16000L, 44000L, 90000L, 27000L, 28000L, 49000L, 72000L, 31000L,
17000L, 51000L, 108000L, 15000L, 84000L, 20000L, 79000L,
54000L, 135000L, 89000L, 32000L, 44000L, 83000L, 23000L,
58000L, 55000L, 48000L, 79000L, 18000L, 117000L, 20000L,
87000L, 66000L, 120000L, 83000L, 58000L, 19000L, 82000L,
63000L, 68000L, 80000L, 27000L, 23000L, 113000L, 18000L,
112000L, 52000L, 27000L, 87000L, 17000L, 80000L, 42000L,
49000L, 88000L, 62000L, 118000L, 55000L, 85000L, 81000L,
50000L, 81000L, 116000L, 15000L, 28000L, 83000L, 44000L,
25000L, 123000L, 73000L, 37000L, 88000L, 59000L, 86000L,
149000L, 21000L, 72000L, 35000L, 89000L, 86000L, 80000L,
71000L, 71000L, 61000L, 55000L, 80000L, 57000L, 75000L, 52000L,
59000L, 59000L, 75000L, 72000L, 75000L, 53000L, 51000L, 61000L,
65000L, 32000L, 17000L, 84000L, 58000L, 31000L, 87000L, 68000L,
55000L, 63000L, 82000L, 107000L, 59000L, 25000L, 85000L,
68000L, 59000L, 89000L, 25000L, 89000L, 96000L, 30000L, 61000L,
74000L, 15000L, 45000L, 76000L, 50000L, 47000L, 15000L, 59000L,
75000L, 30000L, 135000L, 100000L, 90000L, 33000L, 38000L,
69000L, 86000L, 55000L, 71000L, 148000L, 47000L, 88000L,
115000L, 118000L, 43000L, 72000L, 28000L, 47000L, 22000L,
23000L, 34000L, 16000L, 71000L, 117000L, 43000L, 60000L,
66000L, 82000L, 41000L, 72000L, 32000L, 84000L, 26000L, 43000L,
70000L, 89000L, 43000L, 79000L, 36000L, 80000L, 22000L, 39000L,
74000L, 134000L, 71000L, 101000L, 47000L, 130000L, 114000L,
142000L, 22000L, 96000L, 150000L, 42000L, 58000L, 43000L,
108000L, 65000L, 78000L, 96000L, 143000L, 80000L, 91000L,
144000L, 102000L, 60000L, 53000L, 126000L, 133000L, 72000L,
80000L, 147000L, 42000L, 107000L, 86000L, 112000L, 79000L,
57000L, 80000L, 82000L, 143000L, 149000L, 59000L, 88000L,
104000L, 72000L, 146000L, 50000L, 122000L, 52000L, 97000L,
39000L, 52000L, 134000L, 146000L, 44000L, 90000L, 72000L,
57000L, 95000L, 131000L, 77000L, 144000L, 125000L, 72000L,
90000L, 108000L, 75000L, 74000L, 144000L, 61000L, 133000L,
76000L, 42000L, 106000L, 26000L, 74000L, 71000L, 88000L,
38000L, 36000L, 88000L, 61000L, 70000L, 21000L, 141000L,
93000L, 62000L, 138000L, 79000L, 78000L, 134000L, 89000L,
39000L, 77000L, 57000L, 63000L, 73000L, 112000L, 79000L,
117000L, 38000L, 74000L, 137000L, 79000L, 60000L, 54000L,
134000L, 113000L, 125000L, 50000L, 70000L, 96000L, 50000L,
141000L, 79000L, 75000L, 104000L, 55000L, 32000L, 60000L,
138000L, 82000L, 52000L, 30000L, 131000L, 60000L, 72000L,
75000L, 118000L, 107000L, 51000L, 119000L, 65000L, 65000L,
60000L, 54000L, 144000L, 79000L, 55000L, 122000L, 104000L,
75000L, 65000L, 51000L, 105000L, 63000L, 72000L, 108000L,
77000L, 61000L, 113000L, 75000L, 90000L, 57000L, 99000L,
34000L, 70000L, 72000L, 71000L, 54000L, 129000L, 34000L,
50000L, 79000L, 104000L, 29000L, 47000L, 88000L, 71000L,
26000L, 46000L, 83000L, 73000L, 130000L, 80000L, 32000L,
74000L, 53000L, 87000L, 23000L, 64000L, 33000L, 139000L,
28000L, 33000L, 60000L, 39000L, 71000L, 34000L, 35000L, 33000L,
23000L, 45000L, 42000L, 59000L, 41000L, 23000L, 20000L, 33000L,
36000L), Purchased = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L,
0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L,
1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L,
1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L,
0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,
0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L,
1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L,
1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L,
1L)), class = "data.frame", row.names = c(NA, -400L))
要在函数调用中未提供网格时获取网格的最小值和最大值 cp
,插入符适合具有 cp = 0
的 rpart 模型,并从拟合模型。从这个 table 中获取最小和最大 cp 值,并使用用户指定的 tuneLength
.
从它们制作网格
这可以从插入符号中实现的source of the rpart model看出。
首先拟合一个cp = 0的模型并提取cp table
initialFit <- rpart::rpart(.outcome ~ .,
data = dat,
control = rpart::rpart.control(cp = 0))$cptable
initialFit <- initialFit[order(-initialFit[,"CP"]), , drop = FALSE]
以下部分指定搜索space
if(search == "grid") {
if(nrow(initialFit) < len) {
tuneSeq <- data.frame(cp = seq(min(initialFit[, "CP"]),
max(initialFit[, "CP"]),
length = len))
} else tuneSeq <- data.frame(cp = initialFit[1:len,"CP"])
colnames(tuneSeq) <- "cp"
} else {
tuneSeq <- data.frame(cp = unique(sample(initialFit[, "CP"],
size = len, replace = TRUE)))
}
如果指定了网格搜索并且tuneLength
(上述函数中的 len 变量)如果大于初始拟合中的 cps 数,则从最小到最大 cp 形成网格,长度等于tuneLength
.
data.frame(cp = seq(min(initialFit[, "CP"]),
max(initialFit[, "CP"]),
length = len))
为了估计所谓的复杂度参数cp
(修剪树很方便),我使用了网格搜索和另一个包caret
的函数。这是代码:
dataset = dataset[3:5]
dataset$Purchased = factor(dataset$Purchased, levels = c(0, 1))
library(caTools)
set.seed(123)
split = sample.split(dataset$Purchased, SplitRatio = 0.75)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
training_set[-3] = scale(training_set[-3])
test_set[-3] = scale(test_set[-3])
library(rpart)
library(caret)
youdenSumary <- function(data, lev = NULL, model = NULL){
if (length(lev) > 2) {
stop(paste("Your outcome has", length(lev), "levels. The joudenSumary() function isn't appropriate."))
}
if (!all(levels(data[, "pred"]) == lev)) {
stop("levels of observed and predicted data do not match")
}
Sens <- caret::sensitivity(data[, "pred"], data[, "obs"], lev[1])
Spec <- caret::specificity(data[, "pred"], data[, "obs"], lev[2])
j <- (Sens + Spec)/2
out <- c(j, Spec, Sens)
names(out) <- c("j", "Spec", "Sens")
out
}
library(caret)
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 20,
search = "grid",summaryFunction = youdenSumary)
classifier = train(form = Purchased ~ ., data = training_set, method = 'rpart',
trControl=trctrl,tuneLength = 10,metric = "j")
classifier
此代码产生以下结果:
cp j Spec Sens
0.00000000 0.9133409 0.9141818 0.9125000
0.06126687 0.9205592 0.9345000 0.9066184
0.12253375 0.9205592 0.9345000 0.9066184
0.18380062 0.9205592 0.9345000 0.9066184
0.24506750 0.8009091 0.6773182 0.9245000
0.30633437 0.7884659 0.6396818 0.9372500
0.36760125 0.7884659 0.6396818 0.9372500
0.42886812 0.7884659 0.6396818 0.9372500
0.49013499 0.7884659 0.6396818 0.9372500
0.55140187 0.6588846 0.3660455 0.9517237
j was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.1838006.
我想知道算法如何为 cp
选择候选值。我阅读了 cp
的 10 个可能值;该算法在其中选择最大化 j
的值,这对我来说很清楚。但是要分配给 cp
的可能值从何而来?
数据集如下:
> dput(dataset)
structure(list(Age = c(19L, 35L, 26L, 27L, 19L, 27L, 27L, 32L,
25L, 35L, 26L, 26L, 20L, 32L, 18L, 29L, 47L, 45L, 46L, 48L, 45L,
47L, 48L, 45L, 46L, 47L, 49L, 47L, 29L, 31L, 31L, 27L, 21L, 28L,
27L, 35L, 33L, 30L, 26L, 27L, 27L, 33L, 35L, 30L, 28L, 23L, 25L,
27L, 30L, 31L, 24L, 18L, 29L, 35L, 27L, 24L, 23L, 28L, 22L, 32L,
27L, 25L, 23L, 32L, 59L, 24L, 24L, 23L, 22L, 31L, 25L, 24L, 20L,
33L, 32L, 34L, 18L, 22L, 28L, 26L, 30L, 39L, 20L, 35L, 30L, 31L,
24L, 28L, 26L, 35L, 22L, 30L, 26L, 29L, 29L, 35L, 35L, 28L, 35L,
28L, 27L, 28L, 32L, 33L, 19L, 21L, 26L, 27L, 26L, 38L, 39L, 37L,
38L, 37L, 42L, 40L, 35L, 36L, 40L, 41L, 36L, 37L, 40L, 35L, 41L,
39L, 42L, 26L, 30L, 26L, 31L, 33L, 30L, 21L, 28L, 23L, 20L, 30L,
28L, 19L, 19L, 18L, 35L, 30L, 34L, 24L, 27L, 41L, 29L, 20L, 26L,
41L, 31L, 36L, 40L, 31L, 46L, 29L, 26L, 32L, 32L, 25L, 37L, 35L,
33L, 18L, 22L, 35L, 29L, 29L, 21L, 34L, 26L, 34L, 34L, 23L, 35L,
25L, 24L, 31L, 26L, 31L, 32L, 33L, 33L, 31L, 20L, 33L, 35L, 28L,
24L, 19L, 29L, 19L, 28L, 34L, 30L, 20L, 26L, 35L, 35L, 49L, 39L,
41L, 58L, 47L, 55L, 52L, 40L, 46L, 48L, 52L, 59L, 35L, 47L, 60L,
49L, 40L, 46L, 59L, 41L, 35L, 37L, 60L, 35L, 37L, 36L, 56L, 40L,
42L, 35L, 39L, 40L, 49L, 38L, 46L, 40L, 37L, 46L, 53L, 42L, 38L,
50L, 56L, 41L, 51L, 35L, 57L, 41L, 35L, 44L, 37L, 48L, 37L, 50L,
52L, 41L, 40L, 58L, 45L, 35L, 36L, 55L, 35L, 48L, 42L, 40L, 37L,
47L, 40L, 43L, 59L, 60L, 39L, 57L, 57L, 38L, 49L, 52L, 50L, 59L,
35L, 37L, 52L, 48L, 37L, 37L, 48L, 41L, 37L, 39L, 49L, 55L, 37L,
35L, 36L, 42L, 43L, 45L, 46L, 58L, 48L, 37L, 37L, 40L, 42L, 51L,
47L, 36L, 38L, 42L, 39L, 38L, 49L, 39L, 39L, 54L, 35L, 45L, 36L,
52L, 53L, 41L, 48L, 48L, 41L, 41L, 42L, 36L, 47L, 38L, 48L, 42L,
40L, 57L, 36L, 58L, 35L, 38L, 39L, 53L, 35L, 38L, 47L, 47L, 41L,
53L, 54L, 39L, 38L, 38L, 37L, 42L, 37L, 36L, 60L, 54L, 41L, 40L,
42L, 43L, 53L, 47L, 42L, 42L, 59L, 58L, 46L, 38L, 54L, 60L, 60L,
39L, 59L, 37L, 46L, 46L, 42L, 41L, 58L, 42L, 48L, 44L, 49L, 57L,
56L, 49L, 39L, 47L, 48L, 48L, 47L, 45L, 60L, 39L, 46L, 51L, 50L,
36L, 49L), EstimatedSalary = c(19000L, 20000L, 43000L, 57000L,
76000L, 58000L, 84000L, 150000L, 33000L, 65000L, 80000L, 52000L,
86000L, 18000L, 82000L, 80000L, 25000L, 26000L, 28000L, 29000L,
22000L, 49000L, 41000L, 22000L, 23000L, 20000L, 28000L, 30000L,
43000L, 18000L, 74000L, 137000L, 16000L, 44000L, 90000L, 27000L,
28000L, 49000L, 72000L, 31000L, 17000L, 51000L, 108000L, 15000L,
84000L, 20000L, 79000L, 54000L, 135000L, 89000L, 32000L, 44000L,
83000L, 23000L, 58000L, 55000L, 48000L, 79000L, 18000L, 117000L,
20000L, 87000L, 66000L, 120000L, 83000L, 58000L, 19000L, 82000L,
63000L, 68000L, 80000L, 27000L, 23000L, 113000L, 18000L, 112000L,
52000L, 27000L, 87000L, 17000L, 80000L, 42000L, 49000L, 88000L,
62000L, 118000L, 55000L, 85000L, 81000L, 50000L, 81000L, 116000L,
15000L, 28000L, 83000L, 44000L, 25000L, 123000L, 73000L, 37000L,
88000L, 59000L, 86000L, 149000L, 21000L, 72000L, 35000L, 89000L,
86000L, 80000L, 71000L, 71000L, 61000L, 55000L, 80000L, 57000L,
75000L, 52000L, 59000L, 59000L, 75000L, 72000L, 75000L, 53000L,
51000L, 61000L, 65000L, 32000L, 17000L, 84000L, 58000L, 31000L,
87000L, 68000L, 55000L, 63000L, 82000L, 107000L, 59000L, 25000L,
85000L, 68000L, 59000L, 89000L, 25000L, 89000L, 96000L, 30000L,
61000L, 74000L, 15000L, 45000L, 76000L, 50000L, 47000L, 15000L,
59000L, 75000L, 30000L, 135000L, 100000L, 90000L, 33000L, 38000L,
69000L, 86000L, 55000L, 71000L, 148000L, 47000L, 88000L, 115000L,
118000L, 43000L, 72000L, 28000L, 47000L, 22000L, 23000L, 34000L,
16000L, 71000L, 117000L, 43000L, 60000L, 66000L, 82000L, 41000L,
72000L, 32000L, 84000L, 26000L, 43000L, 70000L, 89000L, 43000L,
79000L, 36000L, 80000L, 22000L, 39000L, 74000L, 134000L, 71000L,
101000L, 47000L, 130000L, 114000L, 142000L, 22000L, 96000L, 150000L,
42000L, 58000L, 43000L, 108000L, 65000L, 78000L, 96000L, 143000L,
80000L, 91000L, 144000L, 102000L, 60000L, 53000L, 126000L, 133000L,
72000L, 80000L, 147000L, 42000L, 107000L, 86000L, 112000L, 79000L,
57000L, 80000L, 82000L, 143000L, 149000L, 59000L, 88000L, 104000L,
72000L, 146000L, 50000L, 122000L, 52000L, 97000L, 39000L, 52000L,
134000L, 146000L, 44000L, 90000L, 72000L, 57000L, 95000L, 131000L,
77000L, 144000L, 125000L, 72000L, 90000L, 108000L, 75000L, 74000L,
144000L, 61000L, 133000L, 76000L, 42000L, 106000L, 26000L, 74000L,
71000L, 88000L, 38000L, 36000L, 88000L, 61000L, 70000L, 21000L,
141000L, 93000L, 62000L, 138000L, 79000L, 78000L, 134000L, 89000L,
39000L, 77000L, 57000L, 63000L, 73000L, 112000L, 79000L, 117000L,
38000L, 74000L, 137000L, 79000L, 60000L, 54000L, 134000L, 113000L,
125000L, 50000L, 70000L, 96000L, 50000L, 141000L, 79000L, 75000L,
104000L, 55000L, 32000L, 60000L, 138000L, 82000L, 52000L, 30000L,
131000L, 60000L, 72000L, 75000L, 118000L, 107000L, 51000L, 119000L,
65000L, 65000L, 60000L, 54000L, 144000L, 79000L, 55000L, 122000L,
104000L, 75000L, 65000L, 51000L, 105000L, 63000L, 72000L, 108000L,
77000L, 61000L, 113000L, 75000L, 90000L, 57000L, 99000L, 34000L,
70000L, 72000L, 71000L, 54000L, 129000L, 34000L, 50000L, 79000L,
104000L, 29000L, 47000L, 88000L, 71000L, 26000L, 46000L, 83000L,
73000L, 130000L, 80000L, 32000L, 74000L, 53000L, 87000L, 23000L,
64000L, 33000L, 139000L, 28000L, 33000L, 60000L, 39000L, 71000L,
34000L, 35000L, 33000L, 23000L, 45000L, 42000L, 59000L, 41000L,
23000L, 20000L, 33000L, 36000L), Purchased = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L,
1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L,
1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L), .Label = c("0",
"1"), class = "factor")), row.names = c(NA, -400L), class = "data.frame")
> dataset = read.csv('Social_Network_Ads.csv')
> dput(dataset)
structure(list(User.ID = c(15624510L, 15810944L, 15668575L, 15603246L,
15804002L, 15728773L, 15598044L, 15694829L, 15600575L, 15727311L,
15570769L, 15606274L, 15746139L, 15704987L, 15628972L, 15697686L,
15733883L, 15617482L, 15704583L, 15621083L, 15649487L, 15736760L,
15714658L, 15599081L, 15705113L, 15631159L, 15792818L, 15633531L,
15744529L, 15669656L, 15581198L, 15729054L, 15573452L, 15776733L,
15724858L, 15713144L, 15690188L, 15689425L, 15671766L, 15782806L,
15764419L, 15591915L, 15772798L, 15792008L, 15715541L, 15639277L,
15798850L, 15776348L, 15727696L, 15793813L, 15694395L, 15764195L,
15744919L, 15671655L, 15654901L, 15649136L, 15775562L, 15807481L,
15642885L, 15789109L, 15814004L, 15673619L, 15595135L, 15583681L,
15605000L, 15718071L, 15679760L, 15654574L, 15577178L, 15595324L,
15756932L, 15726358L, 15595228L, 15782530L, 15592877L, 15651983L,
15746737L, 15774179L, 15667265L, 15655123L, 15595917L, 15668385L,
15709476L, 15711218L, 15798659L, 15663939L, 15694946L, 15631912L,
15768816L, 15682268L, 15684801L, 15636428L, 15809823L, 15699284L,
15786993L, 15709441L, 15710257L, 15582492L, 15575694L, 15756820L,
15766289L, 15593014L, 15584545L, 15675949L, 15672091L, 15801658L,
15706185L, 15789863L, 15720943L, 15697997L, 15665416L, 15660200L,
15619653L, 15773447L, 15739160L, 15689237L, 15679297L, 15591433L,
15642725L, 15701962L, 15811613L, 15741049L, 15724423L, 15574305L,
15678168L, 15697020L, 15610801L, 15745232L, 15722758L, 15792102L,
15675185L, 15801247L, 15725660L, 15638963L, 15800061L, 15578006L,
15668504L, 15687491L, 15610403L, 15741094L, 15807909L, 15666141L,
15617134L, 15783029L, 15622833L, 15746422L, 15750839L, 15749130L,
15779862L, 15767871L, 15679651L, 15576219L, 15699247L, 15619087L,
15605327L, 15610140L, 15791174L, 15602373L, 15762605L, 15598840L,
15744279L, 15670619L, 15599533L, 15757837L, 15697574L, 15578738L,
15762228L, 15614827L, 15789815L, 15579781L, 15587013L, 15570932L,
15794661L, 15581654L, 15644296L, 15614420L, 15609653L, 15594577L,
15584114L, 15673367L, 15685576L, 15774727L, 15694288L, 15603319L,
15759066L, 15814816L, 15724402L, 15571059L, 15674206L, 15715160L,
15730448L, 15662067L, 15779581L, 15662901L, 15689751L, 15667742L,
15738448L, 15680243L, 15745083L, 15708228L, 15628523L, 15708196L,
15735549L, 15809347L, 15660866L, 15766609L, 15654230L, 15794566L,
15800890L, 15697424L, 15724536L, 15735878L, 15707596L, 15657163L,
15622478L, 15779529L, 15636023L, 15582066L, 15666675L, 15732987L,
15789432L, 15663161L, 15694879L, 15593715L, 15575002L, 15622171L,
15795224L, 15685346L, 15691808L, 15721007L, 15794253L, 15694453L,
15813113L, 15614187L, 15619407L, 15646227L, 15660541L, 15753874L,
15617877L, 15772073L, 15701537L, 15736228L, 15780572L, 15769596L,
15586996L, 15722061L, 15638003L, 15775590L, 15730688L, 15753102L,
15810075L, 15723373L, 15795298L, 15584320L, 15724161L, 15750056L,
15609637L, 15794493L, 15569641L, 15815236L, 15811177L, 15680587L,
15672821L, 15767681L, 15600379L, 15801336L, 15721592L, 15581282L,
15746203L, 15583137L, 15680752L, 15688172L, 15791373L, 15589449L,
15692819L, 15727467L, 15734312L, 15764604L, 15613014L, 15759684L,
15609669L, 15685536L, 15750447L, 15663249L, 15638646L, 15734161L,
15631070L, 15761950L, 15649668L, 15713912L, 15586757L, 15596522L,
15625395L, 15760570L, 15566689L, 15725794L, 15673539L, 15705298L,
15675791L, 15747043L, 15736397L, 15678201L, 15720745L, 15637593L,
15598070L, 15787550L, 15603942L, 15733973L, 15596761L, 15652400L,
15717893L, 15622585L, 15733964L, 15753861L, 15747097L, 15594762L,
15667417L, 15684861L, 15742204L, 15623502L, 15774872L, 15611191L,
15674331L, 15619465L, 15575247L, 15695679L, 15713463L, 15785170L,
15796351L, 15639576L, 15693264L, 15589715L, 15769902L, 15587177L,
15814553L, 15601550L, 15664907L, 15612465L, 15810800L, 15665760L,
15588080L, 15776844L, 15717560L, 15629739L, 15729908L, 15716781L,
15646936L, 15768151L, 15579212L, 15721835L, 15800515L, 15591279L,
15587419L, 15750335L, 15699619L, 15606472L, 15778368L, 15671387L,
15573926L, 15709183L, 15577514L, 15778830L, 15768072L, 15768293L,
15654456L, 15807525L, 15574372L, 15671249L, 15779744L, 15624755L,
15611430L, 15774744L, 15629885L, 15708791L, 15793890L, 15646091L,
15596984L, 15800215L, 15577806L, 15749381L, 15683758L, 15670615L,
15715622L, 15707634L, 15806901L, 15775335L, 15724150L, 15627220L,
15672330L, 15668521L, 15807837L, 15592570L, 15748589L, 15635893L,
15757632L, 15691863L, 15706071L, 15654296L, 15755018L, 15594041L
), Gender = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L,
2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L,
1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,
1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L,
1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L,
1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L,
1L, 1L, 2L, 1L, 2L, 1L), .Label = c("Female", "Male"), class = "factor"),
Age = c(19L, 35L, 26L, 27L, 19L, 27L, 27L, 32L, 25L, 35L,
26L, 26L, 20L, 32L, 18L, 29L, 47L, 45L, 46L, 48L, 45L, 47L,
48L, 45L, 46L, 47L, 49L, 47L, 29L, 31L, 31L, 27L, 21L, 28L,
27L, 35L, 33L, 30L, 26L, 27L, 27L, 33L, 35L, 30L, 28L, 23L,
25L, 27L, 30L, 31L, 24L, 18L, 29L, 35L, 27L, 24L, 23L, 28L,
22L, 32L, 27L, 25L, 23L, 32L, 59L, 24L, 24L, 23L, 22L, 31L,
25L, 24L, 20L, 33L, 32L, 34L, 18L, 22L, 28L, 26L, 30L, 39L,
20L, 35L, 30L, 31L, 24L, 28L, 26L, 35L, 22L, 30L, 26L, 29L,
29L, 35L, 35L, 28L, 35L, 28L, 27L, 28L, 32L, 33L, 19L, 21L,
26L, 27L, 26L, 38L, 39L, 37L, 38L, 37L, 42L, 40L, 35L, 36L,
40L, 41L, 36L, 37L, 40L, 35L, 41L, 39L, 42L, 26L, 30L, 26L,
31L, 33L, 30L, 21L, 28L, 23L, 20L, 30L, 28L, 19L, 19L, 18L,
35L, 30L, 34L, 24L, 27L, 41L, 29L, 20L, 26L, 41L, 31L, 36L,
40L, 31L, 46L, 29L, 26L, 32L, 32L, 25L, 37L, 35L, 33L, 18L,
22L, 35L, 29L, 29L, 21L, 34L, 26L, 34L, 34L, 23L, 35L, 25L,
24L, 31L, 26L, 31L, 32L, 33L, 33L, 31L, 20L, 33L, 35L, 28L,
24L, 19L, 29L, 19L, 28L, 34L, 30L, 20L, 26L, 35L, 35L, 49L,
39L, 41L, 58L, 47L, 55L, 52L, 40L, 46L, 48L, 52L, 59L, 35L,
47L, 60L, 49L, 40L, 46L, 59L, 41L, 35L, 37L, 60L, 35L, 37L,
36L, 56L, 40L, 42L, 35L, 39L, 40L, 49L, 38L, 46L, 40L, 37L,
46L, 53L, 42L, 38L, 50L, 56L, 41L, 51L, 35L, 57L, 41L, 35L,
44L, 37L, 48L, 37L, 50L, 52L, 41L, 40L, 58L, 45L, 35L, 36L,
55L, 35L, 48L, 42L, 40L, 37L, 47L, 40L, 43L, 59L, 60L, 39L,
57L, 57L, 38L, 49L, 52L, 50L, 59L, 35L, 37L, 52L, 48L, 37L,
37L, 48L, 41L, 37L, 39L, 49L, 55L, 37L, 35L, 36L, 42L, 43L,
45L, 46L, 58L, 48L, 37L, 37L, 40L, 42L, 51L, 47L, 36L, 38L,
42L, 39L, 38L, 49L, 39L, 39L, 54L, 35L, 45L, 36L, 52L, 53L,
41L, 48L, 48L, 41L, 41L, 42L, 36L, 47L, 38L, 48L, 42L, 40L,
57L, 36L, 58L, 35L, 38L, 39L, 53L, 35L, 38L, 47L, 47L, 41L,
53L, 54L, 39L, 38L, 38L, 37L, 42L, 37L, 36L, 60L, 54L, 41L,
40L, 42L, 43L, 53L, 47L, 42L, 42L, 59L, 58L, 46L, 38L, 54L,
60L, 60L, 39L, 59L, 37L, 46L, 46L, 42L, 41L, 58L, 42L, 48L,
44L, 49L, 57L, 56L, 49L, 39L, 47L, 48L, 48L, 47L, 45L, 60L,
39L, 46L, 51L, 50L, 36L, 49L), EstimatedSalary = c(19000L,
20000L, 43000L, 57000L, 76000L, 58000L, 84000L, 150000L,
33000L, 65000L, 80000L, 52000L, 86000L, 18000L, 82000L, 80000L,
25000L, 26000L, 28000L, 29000L, 22000L, 49000L, 41000L, 22000L,
23000L, 20000L, 28000L, 30000L, 43000L, 18000L, 74000L, 137000L,
16000L, 44000L, 90000L, 27000L, 28000L, 49000L, 72000L, 31000L,
17000L, 51000L, 108000L, 15000L, 84000L, 20000L, 79000L,
54000L, 135000L, 89000L, 32000L, 44000L, 83000L, 23000L,
58000L, 55000L, 48000L, 79000L, 18000L, 117000L, 20000L,
87000L, 66000L, 120000L, 83000L, 58000L, 19000L, 82000L,
63000L, 68000L, 80000L, 27000L, 23000L, 113000L, 18000L,
112000L, 52000L, 27000L, 87000L, 17000L, 80000L, 42000L,
49000L, 88000L, 62000L, 118000L, 55000L, 85000L, 81000L,
50000L, 81000L, 116000L, 15000L, 28000L, 83000L, 44000L,
25000L, 123000L, 73000L, 37000L, 88000L, 59000L, 86000L,
149000L, 21000L, 72000L, 35000L, 89000L, 86000L, 80000L,
71000L, 71000L, 61000L, 55000L, 80000L, 57000L, 75000L, 52000L,
59000L, 59000L, 75000L, 72000L, 75000L, 53000L, 51000L, 61000L,
65000L, 32000L, 17000L, 84000L, 58000L, 31000L, 87000L, 68000L,
55000L, 63000L, 82000L, 107000L, 59000L, 25000L, 85000L,
68000L, 59000L, 89000L, 25000L, 89000L, 96000L, 30000L, 61000L,
74000L, 15000L, 45000L, 76000L, 50000L, 47000L, 15000L, 59000L,
75000L, 30000L, 135000L, 100000L, 90000L, 33000L, 38000L,
69000L, 86000L, 55000L, 71000L, 148000L, 47000L, 88000L,
115000L, 118000L, 43000L, 72000L, 28000L, 47000L, 22000L,
23000L, 34000L, 16000L, 71000L, 117000L, 43000L, 60000L,
66000L, 82000L, 41000L, 72000L, 32000L, 84000L, 26000L, 43000L,
70000L, 89000L, 43000L, 79000L, 36000L, 80000L, 22000L, 39000L,
74000L, 134000L, 71000L, 101000L, 47000L, 130000L, 114000L,
142000L, 22000L, 96000L, 150000L, 42000L, 58000L, 43000L,
108000L, 65000L, 78000L, 96000L, 143000L, 80000L, 91000L,
144000L, 102000L, 60000L, 53000L, 126000L, 133000L, 72000L,
80000L, 147000L, 42000L, 107000L, 86000L, 112000L, 79000L,
57000L, 80000L, 82000L, 143000L, 149000L, 59000L, 88000L,
104000L, 72000L, 146000L, 50000L, 122000L, 52000L, 97000L,
39000L, 52000L, 134000L, 146000L, 44000L, 90000L, 72000L,
57000L, 95000L, 131000L, 77000L, 144000L, 125000L, 72000L,
90000L, 108000L, 75000L, 74000L, 144000L, 61000L, 133000L,
76000L, 42000L, 106000L, 26000L, 74000L, 71000L, 88000L,
38000L, 36000L, 88000L, 61000L, 70000L, 21000L, 141000L,
93000L, 62000L, 138000L, 79000L, 78000L, 134000L, 89000L,
39000L, 77000L, 57000L, 63000L, 73000L, 112000L, 79000L,
117000L, 38000L, 74000L, 137000L, 79000L, 60000L, 54000L,
134000L, 113000L, 125000L, 50000L, 70000L, 96000L, 50000L,
141000L, 79000L, 75000L, 104000L, 55000L, 32000L, 60000L,
138000L, 82000L, 52000L, 30000L, 131000L, 60000L, 72000L,
75000L, 118000L, 107000L, 51000L, 119000L, 65000L, 65000L,
60000L, 54000L, 144000L, 79000L, 55000L, 122000L, 104000L,
75000L, 65000L, 51000L, 105000L, 63000L, 72000L, 108000L,
77000L, 61000L, 113000L, 75000L, 90000L, 57000L, 99000L,
34000L, 70000L, 72000L, 71000L, 54000L, 129000L, 34000L,
50000L, 79000L, 104000L, 29000L, 47000L, 88000L, 71000L,
26000L, 46000L, 83000L, 73000L, 130000L, 80000L, 32000L,
74000L, 53000L, 87000L, 23000L, 64000L, 33000L, 139000L,
28000L, 33000L, 60000L, 39000L, 71000L, 34000L, 35000L, 33000L,
23000L, 45000L, 42000L, 59000L, 41000L, 23000L, 20000L, 33000L,
36000L), Purchased = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L,
0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L,
1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L,
1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L,
0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,
0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L,
1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L,
1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L,
1L)), class = "data.frame", row.names = c(NA, -400L))
要在函数调用中未提供网格时获取网格的最小值和最大值 cp
,插入符适合具有 cp = 0
的 rpart 模型,并从拟合模型。从这个 table 中获取最小和最大 cp 值,并使用用户指定的 tuneLength
.
这可以从插入符号中实现的source of the rpart model看出。
首先拟合一个cp = 0的模型并提取cp table
initialFit <- rpart::rpart(.outcome ~ .,
data = dat,
control = rpart::rpart.control(cp = 0))$cptable
initialFit <- initialFit[order(-initialFit[,"CP"]), , drop = FALSE]
以下部分指定搜索space
if(search == "grid") {
if(nrow(initialFit) < len) {
tuneSeq <- data.frame(cp = seq(min(initialFit[, "CP"]),
max(initialFit[, "CP"]),
length = len))
} else tuneSeq <- data.frame(cp = initialFit[1:len,"CP"])
colnames(tuneSeq) <- "cp"
} else {
tuneSeq <- data.frame(cp = unique(sample(initialFit[, "CP"],
size = len, replace = TRUE)))
}
如果指定了网格搜索并且tuneLength
(上述函数中的 len 变量)如果大于初始拟合中的 cps 数,则从最小到最大 cp 形成网格,长度等于tuneLength
.
data.frame(cp = seq(min(initialFit[, "CP"]),
max(initialFit[, "CP"]),
length = len))