R boruta 包 - (列表)对象不能被强制键入 'double'
R boruta package - (list) object cannot be coerced to type 'double'
我正在尝试 运行 在我的数据集上进行 boruta 特征选择。
代码如下:
df<-read.csv('F:/DataAnalyticsClub/DACaseComp/DatasetDist/Datasets/BestFile.csv',stringsAsFactors=FALSE )
install.packages("Boruta")
library(Boruta)
df[is.na(df)] <- 0
df[df == ""] <- 0
X<-df[ , -which(names(df) %in% c("PREVSALEDATE","PREVSALEDATE2","ClassLabel", "PARID", "PROPERTYUNIT", "PriceDiff1", "PriceDiff2", "DateDiff1", "DateDiff2", "SALEDATE"))]
Y<-df['ClassLabel']
factorCols <- c("SCHOOLDESC","MUNIDESC","SALEDESC","INSTRTYPDESC","NEIGHDESC","TAXDESC","TAXSUBCODE_DESC","OWNERDESC","USEDESC","LOTAREA","CLEANGREEN","FARMSTEADFLAG","ABATEMENTFLAG","COUNTYEXEMPTBLDG","STYLEDESC","EXTFINISH_DESC","ROOFDESC","BASEMENTDESC","GRADEDESC","CONDITIONDESC","CDUDESC","HEATINGCOOLINGDESC","BSMTGARAGE")
nonFactorCols<-c("PRICE","COUNTYTOTAL","LOCALTOTAL","FAIRMARKETTOTAL","STORIES","YEARBLT","TOTALROOMS","BEDROOMS","FULLBATHS","HALFBATHS","FIREPLACES","FINISHEDLIVINGAREA","PREVSALEPRICE","PREVSALEPRICE2")
X[factorCols] <- lapply(X[factorCols], factor)
set.seed(123)
boruta.train<-Boruta(X,Y)
所以你看到我有一个不同特征的数据集,其中一些是字符串特征,所以我将它们转换为因子。其余为数字。我测试我的假设:
一旦我 运行 Boruta 我得到
Error in data.matrix(data.selected) :
(list) object cannot be coerced to type 'double'
我不知道为什么。我所有的专栏都是因子或 varoius 数字类型。有什么问题吗?
谷歌搜索了一下后,我发现有些人建议进行 as.matrix() 转换,但在这种情况下:
> boruta.train<-Boruta(as.matrix(X),as.matrix(Y))
Error: Variable none not found. Ranger will EXIT now.
Error in ranger::ranger(data = x, dependent.variable.name = "shadow.Boruta.decision", :
User interrupt or internal error.
好的,在尝试之后我设法找出了问题所在。 Boruta 要求 Y(目标)是列表类型,而不是数据框或其他任何类型。
所以像这样创建 Y:
Y<-df[,'ClassLabel']
问题解决。
我正在尝试 运行 在我的数据集上进行 boruta 特征选择。
代码如下:
df<-read.csv('F:/DataAnalyticsClub/DACaseComp/DatasetDist/Datasets/BestFile.csv',stringsAsFactors=FALSE )
install.packages("Boruta")
library(Boruta)
df[is.na(df)] <- 0
df[df == ""] <- 0
X<-df[ , -which(names(df) %in% c("PREVSALEDATE","PREVSALEDATE2","ClassLabel", "PARID", "PROPERTYUNIT", "PriceDiff1", "PriceDiff2", "DateDiff1", "DateDiff2", "SALEDATE"))]
Y<-df['ClassLabel']
factorCols <- c("SCHOOLDESC","MUNIDESC","SALEDESC","INSTRTYPDESC","NEIGHDESC","TAXDESC","TAXSUBCODE_DESC","OWNERDESC","USEDESC","LOTAREA","CLEANGREEN","FARMSTEADFLAG","ABATEMENTFLAG","COUNTYEXEMPTBLDG","STYLEDESC","EXTFINISH_DESC","ROOFDESC","BASEMENTDESC","GRADEDESC","CONDITIONDESC","CDUDESC","HEATINGCOOLINGDESC","BSMTGARAGE")
nonFactorCols<-c("PRICE","COUNTYTOTAL","LOCALTOTAL","FAIRMARKETTOTAL","STORIES","YEARBLT","TOTALROOMS","BEDROOMS","FULLBATHS","HALFBATHS","FIREPLACES","FINISHEDLIVINGAREA","PREVSALEPRICE","PREVSALEPRICE2")
X[factorCols] <- lapply(X[factorCols], factor)
set.seed(123)
boruta.train<-Boruta(X,Y)
所以你看到我有一个不同特征的数据集,其中一些是字符串特征,所以我将它们转换为因子。其余为数字。我测试我的假设:
Error in data.matrix(data.selected) :
(list) object cannot be coerced to type 'double'
我不知道为什么。我所有的专栏都是因子或 varoius 数字类型。有什么问题吗?
谷歌搜索了一下后,我发现有些人建议进行 as.matrix() 转换,但在这种情况下:
> boruta.train<-Boruta(as.matrix(X),as.matrix(Y))
Error: Variable none not found. Ranger will EXIT now.
Error in ranger::ranger(data = x, dependent.variable.name = "shadow.Boruta.decision", :
User interrupt or internal error.
好的,在尝试之后我设法找出了问题所在。 Boruta 要求 Y(目标)是列表类型,而不是数据框或其他任何类型。
所以像这样创建 Y:
Y<-df[,'ClassLabel']
问题解决。