如何使用 R 生成类似(可能更好)的混淆矩阵 table / 数据框(如下图所示)
How to produce a similar (possibly better) confusion matrix table / data frame (as shown in the photo below) using R
我有我的机器学习模型的混淆矩阵结果,我必须展示我的结果。我使用下图所示的 Microsoft Word 手动制作了以下 table。如您所见,它并不好看table,更重要的是,将结果从 R 一个一个地传输到 Microsoft Word 并进行错误的手动计算需要花费很多时间。
这是我想使用 R 生成的 table,因为我的大部分分析都是在 R 中完成的。我也非常愿意接受你的建议,让它变得更好,因为我将使用table 在科学报告中。
为了可重复性,我使用了代码 dput(cm_df)(这是我使用 as.data.frame(cm_table) 转换为 data.frame 的混淆矩阵)和得到这个结果:
structure(list(Prediction = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L,
5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("1",
"2", "3", "4", "5", "6"), class = "factor"), Reference = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L,
6L, 6L, 6L), .Label = c("1", "2", "3", "4", "5", "6"), class = "factor"),
Freq = c(1L, 0L, 0L, 0L, 0L, 0L, 1L, 9L, 0L, 0L, 1L, 0L,
1L, 2L, 12L, 1L, 2L, 0L, 0L, 4L, 1L, 0L, 1L, 1L, 0L, 7L,
1L, 0L, 15L, 0L, 0L, 0L, 2L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-36L))
有什么想法吗?
有很多用于格式化表格的选项和包,它们提供不同的输出格式(例如 markdown、html、pdf、docx 等)。
这是一个使用 huxtable
包的例子:
library(data.table)
library(huxtable)
library(dplyr)
# reformatted your cm_df data.frame
res <- dcast(as.data.table(cm_df), Prediction ~ Reference, value.var = "Freq")
# extracted the numeric matrix to calculate the statistics
mat <- data.matrix(res[,-1])
# set res as character (required for merging)
res[] <- lapply(res, as.character)
# calculate and format the statistics
eoc <- (rowSums(mat) - diag(mat))/rowSums(mat)
res[, `:=`(UA = paste0(round(100*(1-eoc)), "%"),
`Error of Commission` = paste0(round(100*eoc), "%"))]
PA <- paste0(round(100*diag(mat)/colSums(mat)), "%")
EO <- paste0(round(100*(1-diag(mat)/colSums(mat))), "%")
# combine column statistics with res
res.tab <- rbind(res, setNames(transpose(data.table(PA=PA, `Er. Omission`=EO),
keep.names = "Prediction"), colnames(res)[1:7]), fill=TRUE)
# format the table
out <- as_huxtable(res.tab) %>%
set_bold(1, everywhere, TRUE) %>%
set_bold(everywhere, 1, TRUE) %>%
set_bottom_border(1, everywhere) %>%
set_bottom_border(7, everywhere) %>%
set_left_border(everywhere, c(2,8), TRUE) %>%
set_align(1, everywhere, "center") %>%
set_align(everywhere, 1, "center") %>%
set_align(c(2:9), c(2:9), "right") %>%
set_col_width(c(0.4, rep(0.2, 6), rep(.3,2))) %>%
set_position("left")
# print table to screen (usually would export in preferred format)
print_screen(out)
#> Prediction │ 1 2 3 4 5
#> ───────────────┼────────────────────────────────
#> 1 │ 1 1 1 0 0
#> 2 │ 0 9 2 4 7
#> 3 │ 0 0 12 1 1
#> 4 │ 0 0 1 0 0
#> 5 │ 0 1 2 1 15
#> 6 │ 0 0 0 1 0
#> ───────────────┼────────────────────────────────
#> PA │ 100% 82% 67% 0% 65%
#> Er. Omission │ 0% 18% 33% 100% 35%
#>
#> Column names: Prediction, 1, 2, 3, 4, 5, 6, UA, Error of Commission
#>
#> 6/9 columns shown.
编辑:
根据要求,您可以添加以下代码来获取一些注释:
# add an empty first column and merge cells
out <- merge_down(as_huxtable(cbind(rep("", 9), out)), 2:8, 1)
# add desired label
out[2,1] <- "Classification"
# add top caption and rotate text in first column
out %>%
set_caption("Reference") %>%
set_rotation(everywhere, 1, 90)
输出(html版本):
我有我的机器学习模型的混淆矩阵结果,我必须展示我的结果。我使用下图所示的 Microsoft Word 手动制作了以下 table。如您所见,它并不好看table,更重要的是,将结果从 R 一个一个地传输到 Microsoft Word 并进行错误的手动计算需要花费很多时间。
这是我想使用 R 生成的 table,因为我的大部分分析都是在 R 中完成的。我也非常愿意接受你的建议,让它变得更好,因为我将使用table 在科学报告中。
为了可重复性,我使用了代码 dput(cm_df)(这是我使用 as.data.frame(cm_table) 转换为 data.frame 的混淆矩阵)和得到这个结果:
structure(list(Prediction = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L,
5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("1",
"2", "3", "4", "5", "6"), class = "factor"), Reference = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L,
6L, 6L, 6L), .Label = c("1", "2", "3", "4", "5", "6"), class = "factor"),
Freq = c(1L, 0L, 0L, 0L, 0L, 0L, 1L, 9L, 0L, 0L, 1L, 0L,
1L, 2L, 12L, 1L, 2L, 0L, 0L, 4L, 1L, 0L, 1L, 1L, 0L, 7L,
1L, 0L, 15L, 0L, 0L, 0L, 2L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-36L))
有什么想法吗?
有很多用于格式化表格的选项和包,它们提供不同的输出格式(例如 markdown、html、pdf、docx 等)。
这是一个使用 huxtable
包的例子:
library(data.table)
library(huxtable)
library(dplyr)
# reformatted your cm_df data.frame
res <- dcast(as.data.table(cm_df), Prediction ~ Reference, value.var = "Freq")
# extracted the numeric matrix to calculate the statistics
mat <- data.matrix(res[,-1])
# set res as character (required for merging)
res[] <- lapply(res, as.character)
# calculate and format the statistics
eoc <- (rowSums(mat) - diag(mat))/rowSums(mat)
res[, `:=`(UA = paste0(round(100*(1-eoc)), "%"),
`Error of Commission` = paste0(round(100*eoc), "%"))]
PA <- paste0(round(100*diag(mat)/colSums(mat)), "%")
EO <- paste0(round(100*(1-diag(mat)/colSums(mat))), "%")
# combine column statistics with res
res.tab <- rbind(res, setNames(transpose(data.table(PA=PA, `Er. Omission`=EO),
keep.names = "Prediction"), colnames(res)[1:7]), fill=TRUE)
# format the table
out <- as_huxtable(res.tab) %>%
set_bold(1, everywhere, TRUE) %>%
set_bold(everywhere, 1, TRUE) %>%
set_bottom_border(1, everywhere) %>%
set_bottom_border(7, everywhere) %>%
set_left_border(everywhere, c(2,8), TRUE) %>%
set_align(1, everywhere, "center") %>%
set_align(everywhere, 1, "center") %>%
set_align(c(2:9), c(2:9), "right") %>%
set_col_width(c(0.4, rep(0.2, 6), rep(.3,2))) %>%
set_position("left")
# print table to screen (usually would export in preferred format)
print_screen(out)
#> Prediction │ 1 2 3 4 5
#> ───────────────┼────────────────────────────────
#> 1 │ 1 1 1 0 0
#> 2 │ 0 9 2 4 7
#> 3 │ 0 0 12 1 1
#> 4 │ 0 0 1 0 0
#> 5 │ 0 1 2 1 15
#> 6 │ 0 0 0 1 0
#> ───────────────┼────────────────────────────────
#> PA │ 100% 82% 67% 0% 65%
#> Er. Omission │ 0% 18% 33% 100% 35%
#>
#> Column names: Prediction, 1, 2, 3, 4, 5, 6, UA, Error of Commission
#>
#> 6/9 columns shown.
编辑:
根据要求,您可以添加以下代码来获取一些注释:
# add an empty first column and merge cells
out <- merge_down(as_huxtable(cbind(rep("", 9), out)), 2:8, 1)
# add desired label
out[2,1] <- "Classification"
# add top caption and rotate text in first column
out %>%
set_caption("Reference") %>%
set_rotation(everywhere, 1, 90)
输出(html版本):