为什么我的代码在 R 中拆分数据时创建 NA

Why does my code create NA's when splitting data in R

我正在尝试将训练集分成两组:训练集和验证集。它确实拆分了它,但出于某种原因,它删除了验证集中的 32 行并将 NA 放在那里。原始数据集中没有 NA。

这是代码:

set.seed(123)
sample <- sample.int(n = nrow(traindata), size = floor(.2*nrow(traindata)), replace = F)
traindata <- traindata[-sample, ] #creating training set
validatiedata  <- traindata[sample, ] #creating validation set

print(traindata)
head(traindata)
tail(traindata)

print(validatiedata)
head(validatiedata)
tail(validatiedata)

我试过使用不同的代码来拆分数据:

library(caTools)
set.seed(123)
split = sample.split(traindata, SplitRatio = 0.8)

# Create training and testing sets
train = subset(traindata, split == TRUE)
test = subset(traindata, split == FALSE)

dim(train); dim(test)

head(traindata)
tail(traindata)

head(validatiedata)
tail(validatiedata)

这第二个代码也不行。它错误地拆分了数据,还在验证集中创建了 NA。

有什么建议吗?

您以错误的顺序创建了数据框 traindatavalidatiedata

traindata <- traindata[-sample, ] # Removes rows from traindata
validatiedata  <- traindata[sample, ] # Tries to extract rows that no longer exists, resulting in NA:s

如果你改变顺序,你就不会遇到这个问题:

validatiedata  <- traindata[sample, ]
traindata <- traindata[-sample, ]