if (any(co)) { 中的错误:缺少需要 TRUE/FALSE 的值

Error in if (any(co)) { : valor ausente donde TRUE/FALSE es necesario

我一直在训练一些模型,当我尝试将支持向量机与径向基函数内核一起使用时,出现以下错误:

> svmRFit <- train(x = Fraud_trainX, 
+                  y = Fraud_trainY, 
+                  method = "svmRadial",
+                  metric = "ROC",
+                  preProc = c("center", "scale"),
+                  tuneLength = 15,
+                  trControl = ctrl)
Error in if (any(co)) { : valor ausente donde TRUE/FALSE es necesario
Además: Warning messages:
1: In FUN(newX[, i], ...) : NAs introducidos por coerción
2: In FUN(newX[, i], ...) : NAs introducidos por coerción
3: In FUN(newX[, i], ...) : NAs introducidos por coerción
4: In FUN(newX[, i], ...) : NAs introducidos por coerción
5: In FUN(newX[, i], ...) : NAs introducidos por coerción
Called from: .local(x, ...)
Browse[1]>

这是我的数据库的摘要:

summary(Fraud_trainX)
        Make      AccidentArea                PolicyType   VehicleCategory
 Pontiac  :1412   Rural: 597   SedC                :2109   Sedan  :3660   
 Toyota   :1177   Urban:5186   SedL                :1857   Sport  :1994   
 Honda    :1054                SedA                :1551   Utility: 129   
 Mazda    : 883                SpoC                : 126                  
 Chevrolet: 637                Utility - All Perils: 113                  
 Accura   : 183                UtiCL               :  16                  
 (Other)  : 437                (Other)             :  11                  
 BasePolicy WeekOfMonthClaimed      Age         PolicyNumber     RepNumber     
 AP:1675    Min.   :1.000      Min.   :16.00   Min.   :    2   Min.   : 1.000  
 C :2246    1st Qu.:2.000      1st Qu.:31.00   1st Qu.: 3866   1st Qu.: 4.000  
 L :1862    Median :3.000      Median :38.00   Median : 7757   Median : 9.000  
            Mean   :2.703      Mean   :40.71   Mean   : 7754   Mean   : 8.473  
            3rd Qu.:4.000      3rd Qu.:49.00   3rd Qu.:11556   3rd Qu.:12.000  
            Max.   :5.000      Max.   :80.00   Max.   :15420   Max.   :16.000  
                               NA's   :130                                     
   Deductible     DriverRating     ClaimSize          Month       
 Min.   :400.0   Min.   :1.000   Min.   :     0   Min.   : 1.000  
 1st Qu.:400.0   1st Qu.:1.000   1st Qu.:  4112   1st Qu.: 3.000  
 Median :400.0   Median :3.000   Median :  8150   Median : 6.000  
 Mean   :407.3   Mean   :2.488   Mean   : 22921   Mean   : 6.384  
 3rd Qu.:400.0   3rd Qu.:3.000   3rd Qu.: 43446   3rd Qu.: 9.000  
 Max.   :700.0   Max.   :4.000   Max.   :141394   Max.   :12.000  
                 NA's   :4                                        
  WeekOfMonth      DayOfWeek     DayOfWeekClaimed  MonthClaimed   
 Min.   :1.000   Min.   :1.000   Min.   :1.000    Min.   : 1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000    1st Qu.: 3.000  
 Median :3.000   Median :4.000   Median :3.000    Median : 6.000  
 Mean   :2.776   Mean   :3.844   Mean   :2.824    Mean   : 6.345  
 3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:4.000    3rd Qu.: 9.000  
 Max.   :5.000   Max.   :7.000   Max.   :7.000    Max.   :12.000  
                                                                  
      Sex         MaritalStatus       Fault         VehiclePrice  
 Min.   :0.0000   Min.   :1.000   Min.   :0.0000   Min.   :1.000  
 1st Qu.:1.0000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:2.000  
 Median :1.0000   Median :2.000   Median :0.0000   Median :2.000  
 Mean   :0.8406   Mean   :1.698   Mean   :0.2722   Mean   :2.783  
 3rd Qu.:1.0000   3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.:3.000  
 Max.   :1.0000   Max.   :3.000   Max.   :1.0000   Max.   :6.000  
                                                                  
 Days_Policy_Accident Days_Policy_Claim PastNumberOfClaims  AgeOfVehicle  
 Min.   :0.000        Min.   :1.000     Min.   :0.000      Min.   :0.000  
 1st Qu.:4.000        1st Qu.:3.000     1st Qu.:0.000      1st Qu.:6.000  
 Median :4.000        Median :3.000     Median :1.000      Median :7.000  
 Mean   :3.971        Mean   :2.993     Mean   :1.333      Mean   :6.592  
 3rd Qu.:4.000        3rd Qu.:3.000     3rd Qu.:2.000      3rd Qu.:8.000  
 Max.   :4.000        Max.   :3.000     Max.   :3.000      Max.   :8.000  
                                                                          
 AgeOfPolicyHolder PoliceReportFiled WitnessPresent      AgentType      
 Min.   :1.00      Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
 1st Qu.:5.00      1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
 Median :6.00      Median :0.00000   Median :0.00000   Median :0.00000  
 Mean   :5.89      Mean   :0.02957   Mean   :0.00536   Mean   :0.01504  
 3rd Qu.:7.00      3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
 Max.   :9.00      Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
                                                                        
 NumberOfSuppliments AddressChange_Claim  NumberOfCars   
 Min.   :0.000       Min.   :0.0000      Min.   :0.0000  
 1st Qu.:0.000       1st Qu.:0.0000      1st Qu.:0.0000  
 Median :1.000       Median :0.0000      Median :0.0000  
 Mean   :1.163       Mean   :0.1757      Mean   :0.1027  
 3rd Qu.:2.000       3rd Qu.:0.0000      3rd Qu.:0.0000  
 Max.   :3.000       Max.   :3.0000      Max.   :3.0000 

数据库结构:

str(Fraud_trainX)
'data.frame':   5783 obs. of  32 variables:
 $ Make                : Factor w/ 19 levels "Accura","BMW",..: 7 18 6 7 6 6 6 3 10 7 ...
 $ AccidentArea        : Factor w/ 2 levels "Rural","Urban": 2 1 2 1 2 2 2 2 2 2 ...
 $ PolicyType          : Factor w/ 8 levels "SedA","SedC",..: 5 3 3 2 3 3 1 2 3 2 ...
 $ VehicleCategory     : Factor w/ 3 levels "Sedan","Sport",..: 2 2 2 1 2 2 1 1 2 1 ...
 $ BasePolicy          : Factor w/ 3 levels "AP","C","L": 2 3 3 2 3 3 1 2 3 2 ...
 $ WeekOfMonthClaimed  : num  4 1 3 1 1 5 1 1 1 4 ...
 $ Age                 : num  34 65 28 NA 61 38 41 28 40 21 ...
 $ PolicyNumber        : num  2 4 13 14 15 16 17 18 21 27 ...
 $ RepNumber           : num  15 4 11 12 3 16 15 6 3 1 ...
 $ Deductible          : num  400 400 400 400 400 400 400 400 400 400 ...
 $ DriverRating        : num  4 2 1 3 1 1 4 1 1 2 ...
 $ ClaimSize           : num  59294 7584 59748 82212 59552 ...
 $ Month               : int  1 6 1 1 1 8 4 7 4 3 ...
 $ WeekOfMonth         : int  3 2 3 5 5 4 4 5 2 3 ...
 $ DayOfWeek           : int  3 6 5 5 1 2 4 7 5 4 ...
 $ DayOfWeekClaimed    : int  1 5 5 3 4 1 3 3 2 4 ...
 $ MonthClaimed        : int  1 7 1 2 2 8 5 8 5 6 ...
 $ Sex                 : int  1 1 1 1 1 1 1 0 1 1 ...
 $ MaritalStatus       : int  1 2 2 1 2 1 2 2 2 2 ...
 $ Fault               : int  0 1 0 1 0 0 0 1 0 0 ...
 $ VehiclePrice        : int  6 2 6 6 6 6 6 2 2 3 ...
 $ Days_Policy_Accident: int  4 4 4 4 4 4 4 4 4 4 ...
 $ Days_Policy_Claim   : int  3 3 3 3 3 3 3 3 3 3 ...
 $ PastNumberOfClaims  : int  0 1 1 0 0 0 0 0 1 3 ...
 $ AgeOfVehicle        : int  6 8 7 0 8 6 7 7 8 5 ...
 $ AgeOfPolicyHolder   : int  5 8 5 1 8 6 6 5 6 4 ...
 $ PoliceReportFiled   : int  1 1 0 0 0 0 0 0 0 0 ...
 $ WitnessPresent      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ AgentType           : int  0 0 0 0 0 0 0 0 0 0 ...
 $ NumberOfSuppliments : int  0 3 0 0 0 0 0 1 3 3 ...
 $ AddressChange_Claim : int  0 0 0 0 0 0 0 0 0 0 ...
 $ NumberOfCars        : int  0 0 0 0 0 0 0 0 0 0 ...

可变响应:

summary(Fraud_trainY)
  No  Yes 
5440  343 

这里有一些关于我用于模型训练的索引和控制:

indx <- createMultiFolds(Fraud_trainY, k = 5, times = 2)
str(indx)
ctrl <- trainControl(method = "repeatedcv",index = indx, 
                     summaryFunction = twoClassSummary,
                     sampling = "up",
                     classProbs = TRUE)

这里是模型参数:

svmRFit <- train(x = Fraud_trainX, 
                 y = Fraud_trainY, 
                 method = "svmRadial",
                 metric = "ROC",
                 preProc = c("center", "scale"),
                 tuneLength = 15,
                 trControl = ctrl)

我已经尝试加载 pROC 库但它没有给我任何有利的结果,我已经从所有变量中删除了包含 NA 的行,响应变量已经具有级别“No”和“是的”。我还完成了 C5.0(“C5.0”)、神经网络(nnet)和逻辑回归(“multinom”)的培训,所有这些数据都为我服务,它给了我模型的结果,这是唯一一个让我犯了某种错误的模型。

正如@AlvaroMartinez 评论的那样,错误是我将变量设为 factor,当我将这些变量更改为 integer 时,模型工作正常。