案例编号是否等于 Lime 中的数据行编号?
Is case number equivalent to data row number in Lime?
刚刚在 R 中发现了 Lime 包,并且仍在尝试完全理解该包。我对使用 'plot_features'
的可视化感到困惑
请原谅我的幼稚
我的问题是,每行的案例编号是连续的吗?换句话说,case 416是否等同于数据中的第416行?如果不是,我如何知道每个案例编号所指的行?
重现上图的示例代码:
library(MASS)
library(lime)
data(biopsy)
biopsy$ID <- NULL
biopsy <- na.omit(biopsy)
biopsy2 = data.frame(ID = 1:nrow(biopsy), biopsy)
names(biopsy2) <- c('ID','clump thickness', 'uniformity of cell size',
'uniformity of cell shape', 'marginal adhesion',
'single epithelial cell size', 'bare nuclei',
'bland chromatin', 'normal nucleoli', 'mitoses',
'class')
# Now we'll fit a linear discriminant model on all but 4 cases
set.seed(4)
test_set <- sample(seq_len(nrow(biopsy2)), 4)
prediction <- biopsy2$class
biopsy2$class <- NULL
model <- lda(biopsy2[-test_set, ], prediction[-test_set])
predict(model, biopsy2[test_set, ])
explainer <- lime(biopsy2[-test_set,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(biopsy2[test_set, ], explainer, n_labels = 1, n_features = 4)
plot_features(explanation, ncol = 1)
编辑:在活检 table 中添加了一个名为 ID
的额外列
正如您在 explanation
中看到的那样,在情节中我们从头开始逐个分析:
head(explanation[, 1:5])
model_type case label label_prob model_r2
1 classification 416 benign 0.9943635 0.5432439
2 classification 416 benign 0.9943635 0.5432439
3 classification 416 benign 0.9943635 0.5432439
4 classification 416 benign 0.9943635 0.5432439
5 classification 7 benign 0.9527375 0.6586789
6 classification 7 benign 0.9527375 0.6586789
但是,由于每个案例都有多行,因此知道对应于哪些行可能不是一个坏主意。为此你可以使用
which(416 == explanation$case)
# [1] 1 2 3 4
所以
explanation[which(416 == explanation$case), 1:5]
# model_type case label label_prob model_r2
# 1 classification 416 benign 0.9949716 0.551287
# 2 classification 416 benign 0.9949716 0.551287
# 3 classification 416 benign 0.9949716 0.551287
# 4 classification 416 benign 0.9949716 0.551287
刚刚在 R 中发现了 Lime 包,并且仍在尝试完全理解该包。我对使用 'plot_features'
的可视化感到困惑请原谅我的幼稚
我的问题是,每行的案例编号是连续的吗?换句话说,case 416是否等同于数据中的第416行?如果不是,我如何知道每个案例编号所指的行?
重现上图的示例代码:
library(MASS)
library(lime)
data(biopsy)
biopsy$ID <- NULL
biopsy <- na.omit(biopsy)
biopsy2 = data.frame(ID = 1:nrow(biopsy), biopsy)
names(biopsy2) <- c('ID','clump thickness', 'uniformity of cell size',
'uniformity of cell shape', 'marginal adhesion',
'single epithelial cell size', 'bare nuclei',
'bland chromatin', 'normal nucleoli', 'mitoses',
'class')
# Now we'll fit a linear discriminant model on all but 4 cases
set.seed(4)
test_set <- sample(seq_len(nrow(biopsy2)), 4)
prediction <- biopsy2$class
biopsy2$class <- NULL
model <- lda(biopsy2[-test_set, ], prediction[-test_set])
predict(model, biopsy2[test_set, ])
explainer <- lime(biopsy2[-test_set,], model, bin_continuous = TRUE, quantile_bins = FALSE)
explanation <- explain(biopsy2[test_set, ], explainer, n_labels = 1, n_features = 4)
plot_features(explanation, ncol = 1)
编辑:在活检 table 中添加了一个名为 ID
的额外列正如您在 explanation
中看到的那样,在情节中我们从头开始逐个分析:
head(explanation[, 1:5])
model_type case label label_prob model_r2
1 classification 416 benign 0.9943635 0.5432439
2 classification 416 benign 0.9943635 0.5432439
3 classification 416 benign 0.9943635 0.5432439
4 classification 416 benign 0.9943635 0.5432439
5 classification 7 benign 0.9527375 0.6586789
6 classification 7 benign 0.9527375 0.6586789
但是,由于每个案例都有多行,因此知道对应于哪些行可能不是一个坏主意。为此你可以使用
which(416 == explanation$case)
# [1] 1 2 3 4
所以
explanation[which(416 == explanation$case), 1:5]
# model_type case label label_prob model_r2
# 1 classification 416 benign 0.9949716 0.551287
# 2 classification 416 benign 0.9949716 0.551287
# 3 classification 416 benign 0.9949716 0.551287
# 4 classification 416 benign 0.9949716 0.551287