从 NA 替换为随机值
Replace from NA to random values
我想从 NA 替换为随机值。这个数据框有一个像“Dayofweek”这样的列,我不知道如何完成这个数据框。我尝试使用 missforest 函数,但我认为此函数适用于具有整数的列。你知道我怎样才能完成所有的专栏吗?
travel <- read.csv("https://openmv.net/file/travel-times.csv")
library(missForest)
summary(travel)
set.seed(82)
travel1 <- prodNA(travel, noNA = 0.2)
travel2 <- missForest(travel1)
首先,如果您想将 ""
字符串读取为 NA
s,您需要在 read.csv
中添加一个额外的参数 na.strings = ""
。那么,您的意思是用同一变量的其他随机观察值替换变量的 NA 观察值吗?如果是这样,请考虑以下过程:
travel <- read.csv("https://openmv.net/file/travel-times.csv", na.strings = "")
set.seed(82)
res <- data.frame(lapply(travel, function(x) {
is_na <- is.na(x)
replace(x, is_na, sample(x[!is_na], sum(is_na), replace = TRUE))
}))
res
看起来像这样
Date StartTime DayOfWeek GoingTo Distance MaxSpeed AvgSpeed AvgMovingSpeed FuelEconomy TotalTime MovingTime Take407All Comments
1 1/6/2012 16:37 Friday Home 51.29 127.4 78.3 84.8 8.5 39.3 36.3 No Medium amount of rain
2 1/6/2012 08:20 Friday GSK 51.63 130.3 81.8 88.9 8.5 37.9 34.9 No Put snow tires on
3 1/4/2012 16:17 Wednesday Home 51.27 127.4 82.0 85.8 8.5 37.5 35.9 No Heavy rain
4 1/4/2012 07:53 Wednesday GSK 49.17 132.3 74.2 82.9 8.31 39.8 35.6 No Accident blocked 407 exit
5 1/3/2012 18:57 Tuesday Home 51.15 136.2 83.4 88.1 9.08 36.8 34.8 No Rain, rain, rain
6 1/3/2012 07:57 Tuesday GSK 51.80 135.8 84.5 88.8 8.37 36.8 35.0 No Backed up at Bronte
7 1/2/2012 17:31 Monday Home 51.37 123.2 82.9 87.3 - 37.2 35.3 No Pumped tires up: check fuel economy improved?
8 1/2/2012 07:34 Monday GSK 49.01 128.3 77.5 85.9 - 37.9 34.3 No Pumped tires up: check fuel economy improved?
9 12/23/2011 08:01 Friday GSK 52.91 130.3 80.9 88.3 8.89 39.3 36.0 No Police slowdown on 403
10 12/22/2011 17:19 Thursday Home 51.17 122.3 70.6 78.1 8.89 43.5 39.3 No Start early to run a batch
您可以使用 imputeTS 包将随机值插入您的时间序列。函数 na_random
可用于此。该函数可用于数字列(其他列将保持不变,这可能很有用,因为您可能不需要评论列的随机文本)
您可以拨打
library("imputeTS")
na_random(yourData)
该函数将查找每列的最低值和最高值,并为您在该范围之间插入随机值。
但您也可以像这样为随机值定义自己的界限:
library("imputeTS")
na_random(yourData, lower_bound = 0, upper_bound = 25)
您的数据可能如下所示:
library("imputeTS")
# To read the input correctly and have the right data types
travel <- read.csv("https://openmv.net/file/travel-times.csv", na.strings = "")
travel$FuelEconomy <- as.numeric(travel$FuelEconomy)
# To perform the missing data replacement
travel <- na_random(travel)
我想从 NA 替换为随机值。这个数据框有一个像“Dayofweek”这样的列,我不知道如何完成这个数据框。我尝试使用 missforest 函数,但我认为此函数适用于具有整数的列。你知道我怎样才能完成所有的专栏吗?
travel <- read.csv("https://openmv.net/file/travel-times.csv")
library(missForest)
summary(travel)
set.seed(82)
travel1 <- prodNA(travel, noNA = 0.2)
travel2 <- missForest(travel1)
首先,如果您想将 ""
字符串读取为 NA
s,您需要在 read.csv
中添加一个额外的参数 na.strings = ""
。那么,您的意思是用同一变量的其他随机观察值替换变量的 NA 观察值吗?如果是这样,请考虑以下过程:
travel <- read.csv("https://openmv.net/file/travel-times.csv", na.strings = "")
set.seed(82)
res <- data.frame(lapply(travel, function(x) {
is_na <- is.na(x)
replace(x, is_na, sample(x[!is_na], sum(is_na), replace = TRUE))
}))
res
看起来像这样
Date StartTime DayOfWeek GoingTo Distance MaxSpeed AvgSpeed AvgMovingSpeed FuelEconomy TotalTime MovingTime Take407All Comments
1 1/6/2012 16:37 Friday Home 51.29 127.4 78.3 84.8 8.5 39.3 36.3 No Medium amount of rain
2 1/6/2012 08:20 Friday GSK 51.63 130.3 81.8 88.9 8.5 37.9 34.9 No Put snow tires on
3 1/4/2012 16:17 Wednesday Home 51.27 127.4 82.0 85.8 8.5 37.5 35.9 No Heavy rain
4 1/4/2012 07:53 Wednesday GSK 49.17 132.3 74.2 82.9 8.31 39.8 35.6 No Accident blocked 407 exit
5 1/3/2012 18:57 Tuesday Home 51.15 136.2 83.4 88.1 9.08 36.8 34.8 No Rain, rain, rain
6 1/3/2012 07:57 Tuesday GSK 51.80 135.8 84.5 88.8 8.37 36.8 35.0 No Backed up at Bronte
7 1/2/2012 17:31 Monday Home 51.37 123.2 82.9 87.3 - 37.2 35.3 No Pumped tires up: check fuel economy improved?
8 1/2/2012 07:34 Monday GSK 49.01 128.3 77.5 85.9 - 37.9 34.3 No Pumped tires up: check fuel economy improved?
9 12/23/2011 08:01 Friday GSK 52.91 130.3 80.9 88.3 8.89 39.3 36.0 No Police slowdown on 403
10 12/22/2011 17:19 Thursday Home 51.17 122.3 70.6 78.1 8.89 43.5 39.3 No Start early to run a batch
您可以使用 imputeTS 包将随机值插入您的时间序列。函数 na_random
可用于此。该函数可用于数字列(其他列将保持不变,这可能很有用,因为您可能不需要评论列的随机文本)
您可以拨打
library("imputeTS")
na_random(yourData)
该函数将查找每列的最低值和最高值,并为您在该范围之间插入随机值。
但您也可以像这样为随机值定义自己的界限:
library("imputeTS")
na_random(yourData, lower_bound = 0, upper_bound = 25)
您的数据可能如下所示:
library("imputeTS")
# To read the input correctly and have the right data types
travel <- read.csv("https://openmv.net/file/travel-times.csv", na.strings = "")
travel$FuelEconomy <- as.numeric(travel$FuelEconomy)
# To perform the missing data replacement
travel <- na_random(travel)