优化 R Boggle 求解器
Optimizing an R Boggle Solver
Forenote:这是 .
的后续问题
我在 R 中编写了一个 Boggle Game Solver(见此 github page for source code),发现它的性能令人失望。
这是它的工作原理...
# Say we have the following set of letters
bog.letters <- c("t", "e", "n", "s", "d", "a", "i", "o",
"l", "e", "r", "o", "c", "f", "i", "e")
# We get the list of paths (permutations) from a pre-existing list
paths <- paths.by.length[[6]] # 6th element corresponds to 8-element "paths"
dim(paths) # [1] 183472 8
# The following function is the key here,
# mapping the 183,472 combinations to the 16 letters
candidates <- apply(X = paths, MARGIN = 1, FUN = function(x) paste(bog.letters[x], collapse=""))
# The only remaining thing is to intersect the candidate words
# with the actual words from our dictionary
dict.words <- dict.fr$mot[dict.fr$taille == 8]
valid.words <- intersect(candidates, dict.words)
13 个字母候选词的可重现示例
bog.letters <- c("t", "e", "n", "s", "d", "a", "i", "o", "l", "e", "r", "o", "c", "f", "i", "e")
n.letters <- 13
paths <- structure(list(V1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), V2 = c(2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2), V3 = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
V4 = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), V5 = c(7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7), V6 = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6), V7 = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), V8 = c(9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9), V9 = c(10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10), V10 = c(11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 13, 13, 13, 13,
13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14), V11 = c(8, 8,
12, 12, 12, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 14, 14,
14, 14, 14, 14, 14, 11, 11, 11, 11, 11, 11, 11, 11), V12 = c(12,
12, 15, 15, 16, 15, 15, 12, 12, 14, 16, 12, 12, 15, 15, 11,
11, 11, 11, 15, 15, 15, 8, 12, 12, 12, 15, 15, 16, 16), V13 = c(15,
16, 14, 16, 15, 12, 16, 8, 16, 13, 12, 8, 15, 12, 14, 8,
12, 15, 16, 11, 12, 16, 12, 8, 15, 16, 12, 16, 12, 15)), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11",
"V12", "V13"), row.names = c(NA, 30L), class = "data.frame")
candidates <- apply(X = paths, MARGIN = 1, FUN = function(x) paste(bog.letters[x], collapse=""))
对于这么小的路径列表,这已经相当快了。但是 13 个字母的单词的实际路径数是 2,644,520。因此可能需要一分钟甚至更长时间才能找到所有候选人。使用 doSNOW,我能够并行化搜索,显着减少总时间,但这有一个巨大的缺点:当使用普通循环时,我可以 exit/break 每当我到达不再找到的话。这对于并行进程来说并不明显(不可能?)。
所以我的问题是:你能想到一个更好的 function/algorithm 来完成这个任务吗?一些 websites 在几秒钟内提供了 Boggle 游戏的解决方案......要么他们生成了所有可能的字母组合并将结果存储在数据库中(!),要么他们显然使用了更好的算法(并且可能是编译语言) 来实现这些结果。
有什么想法吗?
使用 Rcpp Gallery 中的 cpp_str_split
函数,运行 时间现在减少到 2644520 条路径的 3 秒。
library(stringi)
paths <- data.frame(matrix(sample(1:16, 13*2644520, TRUE), ncol=13))
a1 <- stri_c(bog.letters[t(as.matrix(paths))], collapse="")
candidates <- cpp_str_split(a1, 13)[[1]]
对于 2644520 条路径,apply
方法在我的笔记本上大约需要 80 秒。
Forenote:这是
我在 R 中编写了一个 Boggle Game Solver(见此 github page for source code),发现它的性能令人失望。
这是它的工作原理...
# Say we have the following set of letters
bog.letters <- c("t", "e", "n", "s", "d", "a", "i", "o",
"l", "e", "r", "o", "c", "f", "i", "e")
# We get the list of paths (permutations) from a pre-existing list
paths <- paths.by.length[[6]] # 6th element corresponds to 8-element "paths"
dim(paths) # [1] 183472 8
# The following function is the key here,
# mapping the 183,472 combinations to the 16 letters
candidates <- apply(X = paths, MARGIN = 1, FUN = function(x) paste(bog.letters[x], collapse=""))
# The only remaining thing is to intersect the candidate words
# with the actual words from our dictionary
dict.words <- dict.fr$mot[dict.fr$taille == 8]
valid.words <- intersect(candidates, dict.words)
13 个字母候选词的可重现示例
bog.letters <- c("t", "e", "n", "s", "d", "a", "i", "o", "l", "e", "r", "o", "c", "f", "i", "e")
n.letters <- 13
paths <- structure(list(V1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), V2 = c(2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2), V3 = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
V4 = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), V5 = c(7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7), V6 = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6), V7 = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), V8 = c(9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9), V9 = c(10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10), V10 = c(11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 13, 13, 13, 13,
13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14), V11 = c(8, 8,
12, 12, 12, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 14, 14,
14, 14, 14, 14, 14, 11, 11, 11, 11, 11, 11, 11, 11), V12 = c(12,
12, 15, 15, 16, 15, 15, 12, 12, 14, 16, 12, 12, 15, 15, 11,
11, 11, 11, 15, 15, 15, 8, 12, 12, 12, 15, 15, 16, 16), V13 = c(15,
16, 14, 16, 15, 12, 16, 8, 16, 13, 12, 8, 15, 12, 14, 8,
12, 15, 16, 11, 12, 16, 12, 8, 15, 16, 12, 16, 12, 15)), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11",
"V12", "V13"), row.names = c(NA, 30L), class = "data.frame")
candidates <- apply(X = paths, MARGIN = 1, FUN = function(x) paste(bog.letters[x], collapse=""))
对于这么小的路径列表,这已经相当快了。但是 13 个字母的单词的实际路径数是 2,644,520。因此可能需要一分钟甚至更长时间才能找到所有候选人。使用 doSNOW,我能够并行化搜索,显着减少总时间,但这有一个巨大的缺点:当使用普通循环时,我可以 exit/break 每当我到达不再找到的话。这对于并行进程来说并不明显(不可能?)。
所以我的问题是:你能想到一个更好的 function/algorithm 来完成这个任务吗?一些 websites 在几秒钟内提供了 Boggle 游戏的解决方案......要么他们生成了所有可能的字母组合并将结果存储在数据库中(!),要么他们显然使用了更好的算法(并且可能是编译语言) 来实现这些结果。
有什么想法吗?
使用 Rcpp Gallery 中的 cpp_str_split
函数,运行 时间现在减少到 2644520 条路径的 3 秒。
library(stringi)
paths <- data.frame(matrix(sample(1:16, 13*2644520, TRUE), ncol=13))
a1 <- stri_c(bog.letters[t(as.matrix(paths))], collapse="")
candidates <- cpp_str_split(a1, 13)[[1]]
对于 2644520 条路径,apply
方法在我的笔记本上大约需要 80 秒。