R中ROSE处理不平衡数据集时变量类型错误错误如何解决?
How to solve the wrong variable type error when handling imbalance dataset by ROSE in R?
我正在使用 Fraud Transaction data 学习 R。当我尝试使用 ROSE 处理不平衡数据集时,弹出 only handle continuous and categorical variables
错误。
这是我尝试过的:
str(dataset)
'data.frame': 6362620 obs. of 13 variables:
$ step : int 1 1 1 1 1 1 1 1 1 1 ...
$ type : chr "PAYMENT" "PAYMENT" "TRANSFER" "CASH_OUT" ...
$ amount : num 9840 1864 181 181 11668 ...
$ nameOrig : chr "C1231006815" "C1666544295" "C1305486145" "C840083671" ...
$ oldbalanceOrg : num 170136 21249 181 181 41554 ...
$ newbalanceOrig : num 160296 19385 0 0 29886 ...
$ nameDest : chr "M1979787155" "M2044282225" "C553264065" "C38997010" ...
$ oldbalanceDest : num 0 0 0 21182 0 ...
$ newbalanceDest : num 0 0 0 0 0 ...
$ isFraud : int 0 0 1 1 0 0 0 0 0 0 ...
$ isFlaggedFraud : int 0 0 0 0 0 0 0 0 0 0 ...
$ balancedOfOrigin: num -9840 -1864 -181 -181 -11668 ...
$ balancedOfDest : num 0 0 0 21182 0 ...
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data
有错误:
Error in rose.sampl(n, N, p, ind.majo, majoY, ind.mino,
minoY, y, classy, : The current implementation of ROSE handles
only continuous and categorical variables.
调试中:
# change the isFraud attribute into category 0/1
dataset$isFraud = as.factor(dataset$isFraud)
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data
最后还是无法解决错误。如何使数据集适合 ROSE 模型?
正如您在 str
部分看到的那样,type
、nameOrig
、nameDest
仍然是字符而不是因素。它将与将它们更改为因素一起使用。但是当我查看 nameOrig
和 nameDest
时,似乎不适合包含在 ROSE
.
中
dummy2 <- head(dataset, 100)
dummy2$isFraud = as.factor(dummy2$isFraud)
#additional part.
dummy2 <- dummy2 %>%
mutate(type = factor(type),
nameDest = factor(nameDest),
nameOrig = factor(nameOrig))
dummy3 <- ROSE(isFraud~., data = dummy2, N = 500, seed = 111)$data
我正在使用 Fraud Transaction data 学习 R。当我尝试使用 ROSE 处理不平衡数据集时,弹出 only handle continuous and categorical variables
错误。
这是我尝试过的:
str(dataset)
'data.frame': 6362620 obs. of 13 variables:
$ step : int 1 1 1 1 1 1 1 1 1 1 ...
$ type : chr "PAYMENT" "PAYMENT" "TRANSFER" "CASH_OUT" ...
$ amount : num 9840 1864 181 181 11668 ...
$ nameOrig : chr "C1231006815" "C1666544295" "C1305486145" "C840083671" ...
$ oldbalanceOrg : num 170136 21249 181 181 41554 ...
$ newbalanceOrig : num 160296 19385 0 0 29886 ...
$ nameDest : chr "M1979787155" "M2044282225" "C553264065" "C38997010" ...
$ oldbalanceDest : num 0 0 0 21182 0 ...
$ newbalanceDest : num 0 0 0 0 0 ...
$ isFraud : int 0 0 1 1 0 0 0 0 0 0 ...
$ isFlaggedFraud : int 0 0 0 0 0 0 0 0 0 0 ...
$ balancedOfOrigin: num -9840 -1864 -181 -181 -11668 ...
$ balancedOfDest : num 0 0 0 21182 0 ...
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data
有错误:
Error in rose.sampl(n, N, p, ind.majo, majoY, ind.mino, minoY, y, classy, : The current implementation of ROSE handles only continuous and categorical variables.
调试中:
# change the isFraud attribute into category 0/1
dataset$isFraud = as.factor(dataset$isFraud)
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data
最后还是无法解决错误。如何使数据集适合 ROSE 模型?
正如您在 str
部分看到的那样,type
、nameOrig
、nameDest
仍然是字符而不是因素。它将与将它们更改为因素一起使用。但是当我查看 nameOrig
和 nameDest
时,似乎不适合包含在 ROSE
.
dummy2 <- head(dataset, 100)
dummy2$isFraud = as.factor(dummy2$isFraud)
#additional part.
dummy2 <- dummy2 %>%
mutate(type = factor(type),
nameDest = factor(nameDest),
nameOrig = factor(nameOrig))
dummy3 <- ROSE(isFraud~., data = dummy2, N = 500, seed = 111)$data