为什么我的代码在 R 中拆分数据时创建 NA
Why does my code create NA's when splitting data in R
我正在尝试将训练集分成两组:训练集和验证集。它确实拆分了它,但出于某种原因,它删除了验证集中的 32 行并将 NA 放在那里。原始数据集中没有 NA。
这是代码:
set.seed(123)
sample <- sample.int(n = nrow(traindata), size = floor(.2*nrow(traindata)), replace = F)
traindata <- traindata[-sample, ] #creating training set
validatiedata <- traindata[sample, ] #creating validation set
print(traindata)
head(traindata)
tail(traindata)
print(validatiedata)
head(validatiedata)
tail(validatiedata)
我试过使用不同的代码来拆分数据:
library(caTools)
set.seed(123)
split = sample.split(traindata, SplitRatio = 0.8)
# Create training and testing sets
train = subset(traindata, split == TRUE)
test = subset(traindata, split == FALSE)
dim(train); dim(test)
head(traindata)
tail(traindata)
head(validatiedata)
tail(validatiedata)
这第二个代码也不行。它错误地拆分了数据,还在验证集中创建了 NA。
有什么建议吗?
您以错误的顺序创建了数据框 traindata
和 validatiedata
:
traindata <- traindata[-sample, ] # Removes rows from traindata
validatiedata <- traindata[sample, ] # Tries to extract rows that no longer exists, resulting in NA:s
如果你改变顺序,你就不会遇到这个问题:
validatiedata <- traindata[sample, ]
traindata <- traindata[-sample, ]
我正在尝试将训练集分成两组:训练集和验证集。它确实拆分了它,但出于某种原因,它删除了验证集中的 32 行并将 NA 放在那里。原始数据集中没有 NA。
这是代码:
set.seed(123)
sample <- sample.int(n = nrow(traindata), size = floor(.2*nrow(traindata)), replace = F)
traindata <- traindata[-sample, ] #creating training set
validatiedata <- traindata[sample, ] #creating validation set
print(traindata)
head(traindata)
tail(traindata)
print(validatiedata)
head(validatiedata)
tail(validatiedata)
我试过使用不同的代码来拆分数据:
library(caTools)
set.seed(123)
split = sample.split(traindata, SplitRatio = 0.8)
# Create training and testing sets
train = subset(traindata, split == TRUE)
test = subset(traindata, split == FALSE)
dim(train); dim(test)
head(traindata)
tail(traindata)
head(validatiedata)
tail(validatiedata)
这第二个代码也不行。它错误地拆分了数据,还在验证集中创建了 NA。
有什么建议吗?
您以错误的顺序创建了数据框 traindata
和 validatiedata
:
traindata <- traindata[-sample, ] # Removes rows from traindata
validatiedata <- traindata[sample, ] # Tries to extract rows that no longer exists, resulting in NA:s
如果你改变顺序,你就不会遇到这个问题:
validatiedata <- traindata[sample, ]
traindata <- traindata[-sample, ]