在 r 中用 kNN 估算缺失值的问题
Problems with imputing missing values with kNN in r
我想用最近邻居的平均值来估算缺失值,但是当我尝试 kNN 时,它给出了一条错误消息。
所以向量是股票价格,这意味着我周末有 NA。我想用凹函数替换 NA 值(星期六、星期日):(星期五值 + 星期一值)/2。我认为 k=2 的 kNN 函数是合适的,但我收到一条错误消息。
> Oriental_Stock$Stock
[1] 42.80 43.05 43.00 43.00 42.20 NA NA 42.50 40.00 40.25 40.55
41.50 NA NA 40.85
> kNN(Oriental_Stock, variable = colnames("Stock"), k = 2)
Error in `[.data.table`(data, indexNA2s[, variable[i]], `:=`(imp_vars[i],
: i is invalid type (matrix). Perhaps in future a 2 column matrix could
return a list of elements of DT (in the spirit of A[B] in FAQ 2.14).
Please report to data.table issue tracker if you'd like this, or add
your comments to FR #657.
请告诉我是否可以这样做,也许有比 kNN 更简单的选择。我不是数据科学家,只是一名学生,所以我对此了解不多。预先感谢您的任何建议!
Knn 将在 data.frame 上工作,它会根据行之间的距离选择邻居。它不适用于矢量。
for-loop 可能是一个公平的解决方案:
#this finds the locations of the first NA of each couple of NAs
#the TRUE / FALSE part below picks only the first NA from each couple
idx <- which(is.na(stock))[c(TRUE, FALSE)]
#this iterates over the above indexes and calculates the mean and updates the NAs
for (x in idx) {
stock[x] <- stock[x+1] <- (stock[x-1] + stock[x+2]) / 2
}
结果:
> stock
[1] 42.800 43.050 43.000 43.000 42.200 42.350 42.350 42.500 40.000
[10] 40.250 40.550 41.500 41.175 41.175 40.850
我用了stock
作为数据:
stock <- c(42.80,43.05, 43.00, 43.00, 42.20, NA, NA, 42.50, 40.00, 40.25, 40.55,
41.50, NA, NA, 40.85)
我想用最近邻居的平均值来估算缺失值,但是当我尝试 kNN 时,它给出了一条错误消息。
所以向量是股票价格,这意味着我周末有 NA。我想用凹函数替换 NA 值(星期六、星期日):(星期五值 + 星期一值)/2。我认为 k=2 的 kNN 函数是合适的,但我收到一条错误消息。
> Oriental_Stock$Stock
[1] 42.80 43.05 43.00 43.00 42.20 NA NA 42.50 40.00 40.25 40.55
41.50 NA NA 40.85
> kNN(Oriental_Stock, variable = colnames("Stock"), k = 2)
Error in `[.data.table`(data, indexNA2s[, variable[i]], `:=`(imp_vars[i],
: i is invalid type (matrix). Perhaps in future a 2 column matrix could
return a list of elements of DT (in the spirit of A[B] in FAQ 2.14).
Please report to data.table issue tracker if you'd like this, or add
your comments to FR #657.
请告诉我是否可以这样做,也许有比 kNN 更简单的选择。我不是数据科学家,只是一名学生,所以我对此了解不多。预先感谢您的任何建议!
Knn 将在 data.frame 上工作,它会根据行之间的距离选择邻居。它不适用于矢量。
for-loop 可能是一个公平的解决方案:
#this finds the locations of the first NA of each couple of NAs
#the TRUE / FALSE part below picks only the first NA from each couple
idx <- which(is.na(stock))[c(TRUE, FALSE)]
#this iterates over the above indexes and calculates the mean and updates the NAs
for (x in idx) {
stock[x] <- stock[x+1] <- (stock[x-1] + stock[x+2]) / 2
}
结果:
> stock
[1] 42.800 43.050 43.000 43.000 42.200 42.350 42.350 42.500 40.000
[10] 40.250 40.550 41.500 41.175 41.175 40.850
我用了stock
作为数据:
stock <- c(42.80,43.05, 43.00, 43.00, 42.20, NA, NA, 42.50, 40.00, 40.25, 40.55,
41.50, NA, NA, 40.85)