R中ROSE处理不平衡数据集时变量类型错误错误如何解决?

How to solve the wrong variable type error when handling imbalance dataset by ROSE in R?

我正在使用 Fraud Transaction data 学习 R。当我尝试使用 ROSE 处理不平衡数据集时,弹出 only handle continuous and categorical variables 错误。

这是我尝试过的:

str(dataset)
'data.frame':   6362620 obs. of  13 variables:
 $ step            : int  1 1 1 1 1 1 1 1 1 1 ...
 $ type            : chr  "PAYMENT" "PAYMENT" "TRANSFER" "CASH_OUT" ...
 $ amount          : num  9840 1864 181 181 11668 ...
 $ nameOrig        : chr  "C1231006815" "C1666544295" "C1305486145" "C840083671" ...
 $ oldbalanceOrg   : num  170136 21249 181 181 41554 ...
 $ newbalanceOrig  : num  160296 19385 0 0 29886 ...
 $ nameDest        : chr  "M1979787155" "M2044282225" "C553264065" "C38997010" ...
 $ oldbalanceDest  : num  0 0 0 21182 0 ...
 $ newbalanceDest  : num  0 0 0 0 0 ...
 $ isFraud         : int  0 0 1 1 0 0 0 0 0 0 ...
 $ isFlaggedFraud  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ balancedOfOrigin: num  -9840 -1864 -181 -181 -11668 ...
 $ balancedOfDest  : num  0 0 0 21182 0 ...

datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data

有错误:

Error in rose.sampl(n, N, p, ind.majo, majoY, ind.mino, minoY, y, classy, : The current implementation of ROSE handles only continuous and categorical variables.

调试中:

# change the isFraud attribute into category 0/1
dataset$isFraud = as.factor(dataset$isFraud)
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data

最后还是无法解决错误。如何使数据集适合 ROSE 模型?

正如您在 str 部分看到的那样,typenameOrignameDest 仍然是字符而不是因素。它将与将它们更改为因素一起使用。但是当我查看 nameOrignameDest 时,似乎不适合包含在 ROSE.

dummy2 <- head(dataset, 100)

dummy2$isFraud = as.factor(dummy2$isFraud)

#additional part.
dummy2 <- dummy2 %>%
  mutate(type = factor(type),
         nameDest = factor(nameDest),
         nameOrig = factor(nameOrig))
dummy3 <- ROSE(isFraud~., data = dummy2, N = 500, seed = 111)$data