R 中的 KNN 异常值检测
KNN outlier detection in R
我正在尝试 运行 使用加权 KNN 异常值分数执行异常值检测的脚本,但不断出现以下错误:
Error in apply(kNNdist(x = dat, k = k), 1, mean) :
dim(X) must have a positive length
我正在尝试 运行 的脚本如下。它是单个脚本块,但我在导致错误的脚本部分正上方添加了注释,即函数:
WKNN_Outlier <- apply(kNNdist(x=dat, k = k), 1, mean)
如果有人对无监督异常值检测有更好或更简单的想法,我会洗耳恭听(可以这么说...)
library(dbscan)
library(ggplot2)
set.seed(0)
x11 <- rnorm(n = 100, mean = 10, sd = 1) # Cluster 1 (x1 coordinate)
x21 <- rnorm(n = 100, mean = 10, sd = 1) # Cluster 1 (x2 coordinate)
x12 <- rnorm(n = 100, mean = 20, sd = 1) # Cluster 2 (x1 coordinate)
x22 <- rnorm(n = 100, mean = 10, sd = 1) # Cluster 2 (x2 coordinate)
x13 <- rnorm(n = 100, mean = 15, sd = 3) # Cluster 3 (x1 coordinate)
x23 <- rnorm(n = 100, mean = 25, sd = 3) # Cluster 3 (x2 coordinate)
x14 <- rnorm(n = 50, mean = 25, sd = 1) # Cluster 4 (x1 coordinate)
x24 <- rnorm(n = 50, mean = 25, sd = 1) # Cluster 4 (x2 coordinate)
dat <- data.frame(x1 = c(x11,x12,x13,x14), x2 = c(x21,x22,x23,x24))
( g0a <- ggplot() + geom_point(data=dat, mapping=aes(x=x1, y=x2), shape = 19) )
k <- 4 # KNN parameter
top_n <- 20 # No. of top outliers to be displayed
KNN_Outlier <- kNNdist(x=dat, k = k)
rank_KNN_Outlier <- order(x=KNN_Outlier, decreasing = TRUE) # Sorting (descending)
KNN_Result <- data.frame(ID = rank_KNN_Outlier, score = KNN_Outlier[rank_KNN_Outlier])
head(KNN_Result, top_n)
graph <- g0a +
geom_point(data=dat[rank_KNN_Outlier[1:top_n],], mapping=aes(x=x1,y=x2), shape=19,
color="red", size=2) +
geom_text(data=dat[rank_KNN_Outlier[1:top_n],],
mapping=aes(x=(x1-0.5), y=x2, label=rank_KNN_Outlier[1:top_n]), size=2.5)
graph
## Use KNNdist() to calculate the weighted KNN outlier score
k <- 4 # KNN parameter
top_n <- 20 # No. of top outliers to be displayed
下面的 WKNN_Outler 函数是导致错误的原因。据我所知,apply 函数应该没有任何问题,因为数据 (dat) 被转换为 data.frame,这应该可以防止错误,但没有。
WKNN_Outlier <- apply(kNNdist(x=dat, k = k), 1, mean) # Weighted KNN outlier score (mean)
rank_WKNN_Outlier <- order(x=WKNN_Outlier, decreasing = TRUE)
WKNN_Result <- data.frame(ID = rank_WKNN_Outlier, score = WKNN_Outlier[rank_WKNN_Outlier])
head(WKNN_Result, top_n)
ge1 <- g0a +
geom_point(data=dat[rank_WKNN_Outlier[1:top_n],], mapping=aes(x=x1,y=x2), shape=19,
color="red", size=2) +
geom_text(data=dat[rank_WKNN_Outlier[1:top_n],],
mapping=aes(x=(x1-0.5), y=x2, label=rank_WKNN_Outlier[1:top_n]), size=2.5)
ge1
函数 kNNdist(x=dat, k = k)
生成向量而不是矩阵,这就是为什么当您尝试执行 apply
函数时它会告诉您 dim(X) must have a positive length
(向量有一个 NULL
昏暗)。
尝试:
WKNN_Outlier <- apply(kNNdist(x=dat, k = k, all=T), 1, mean)
我正在尝试 运行 使用加权 KNN 异常值分数执行异常值检测的脚本,但不断出现以下错误:
Error in apply(kNNdist(x = dat, k = k), 1, mean) :
dim(X) must have a positive length
我正在尝试 运行 的脚本如下。它是单个脚本块,但我在导致错误的脚本部分正上方添加了注释,即函数:
WKNN_Outlier <- apply(kNNdist(x=dat, k = k), 1, mean)
如果有人对无监督异常值检测有更好或更简单的想法,我会洗耳恭听(可以这么说...)
library(dbscan)
library(ggplot2)
set.seed(0)
x11 <- rnorm(n = 100, mean = 10, sd = 1) # Cluster 1 (x1 coordinate)
x21 <- rnorm(n = 100, mean = 10, sd = 1) # Cluster 1 (x2 coordinate)
x12 <- rnorm(n = 100, mean = 20, sd = 1) # Cluster 2 (x1 coordinate)
x22 <- rnorm(n = 100, mean = 10, sd = 1) # Cluster 2 (x2 coordinate)
x13 <- rnorm(n = 100, mean = 15, sd = 3) # Cluster 3 (x1 coordinate)
x23 <- rnorm(n = 100, mean = 25, sd = 3) # Cluster 3 (x2 coordinate)
x14 <- rnorm(n = 50, mean = 25, sd = 1) # Cluster 4 (x1 coordinate)
x24 <- rnorm(n = 50, mean = 25, sd = 1) # Cluster 4 (x2 coordinate)
dat <- data.frame(x1 = c(x11,x12,x13,x14), x2 = c(x21,x22,x23,x24))
( g0a <- ggplot() + geom_point(data=dat, mapping=aes(x=x1, y=x2), shape = 19) )
k <- 4 # KNN parameter
top_n <- 20 # No. of top outliers to be displayed
KNN_Outlier <- kNNdist(x=dat, k = k)
rank_KNN_Outlier <- order(x=KNN_Outlier, decreasing = TRUE) # Sorting (descending)
KNN_Result <- data.frame(ID = rank_KNN_Outlier, score = KNN_Outlier[rank_KNN_Outlier])
head(KNN_Result, top_n)
graph <- g0a +
geom_point(data=dat[rank_KNN_Outlier[1:top_n],], mapping=aes(x=x1,y=x2), shape=19,
color="red", size=2) +
geom_text(data=dat[rank_KNN_Outlier[1:top_n],],
mapping=aes(x=(x1-0.5), y=x2, label=rank_KNN_Outlier[1:top_n]), size=2.5)
graph
## Use KNNdist() to calculate the weighted KNN outlier score
k <- 4 # KNN parameter
top_n <- 20 # No. of top outliers to be displayed
下面的 WKNN_Outler 函数是导致错误的原因。据我所知,apply 函数应该没有任何问题,因为数据 (dat) 被转换为 data.frame,这应该可以防止错误,但没有。
WKNN_Outlier <- apply(kNNdist(x=dat, k = k), 1, mean) # Weighted KNN outlier score (mean)
rank_WKNN_Outlier <- order(x=WKNN_Outlier, decreasing = TRUE)
WKNN_Result <- data.frame(ID = rank_WKNN_Outlier, score = WKNN_Outlier[rank_WKNN_Outlier])
head(WKNN_Result, top_n)
ge1 <- g0a +
geom_point(data=dat[rank_WKNN_Outlier[1:top_n],], mapping=aes(x=x1,y=x2), shape=19,
color="red", size=2) +
geom_text(data=dat[rank_WKNN_Outlier[1:top_n],],
mapping=aes(x=(x1-0.5), y=x2, label=rank_WKNN_Outlier[1:top_n]), size=2.5)
ge1
函数 kNNdist(x=dat, k = k)
生成向量而不是矩阵,这就是为什么当您尝试执行 apply
函数时它会告诉您 dim(X) must have a positive length
(向量有一个 NULL
昏暗)。
尝试:
WKNN_Outlier <- apply(kNNdist(x=dat, k = k, all=T), 1, mean)