算法:可以使用哪组长度为 N 的图块来生成最多数量的拼字游戏有效单词?
Algorithm: What set of tiles of length N can be used to generate the most amount of Scrabble-valid words?
我正在尝试创建一个函数 best_tiles
,它接收您手中的牌数和 return 允许您生成最多独特英语的牌组- 有效的单词,假设你只能使用每个方块一次。
例如,用你手上的那组牌(A, B, C)
你可以拼出CAB、BAC、AB、BA(都是英文单词),所以你可以拼出4个独特的与那套话。使用 (B, A, A)
,您可以拼出 5 个单词:ABA、BAA、AA、AB 和 BA。目标是找到一组字母,使您可以拼出最多的英语有效单词(无需替换)。
因此,如果 5 是 N = 3 的任意字母组合可以拼出的最大单词数,则 运行 best_tiles( n = 3 )
将 return B, A, A
。
我想知道如何有效地实现它?我目前的方法根本无法很好地适应字母数量。
我读了一个单词表。在这种情况下,我在这里使用 enable.txt:https://www.wordgamedictionary.com/enable/
import os
path = "enable.txt"
words = []
with open(path , encoding='utf8') as f:
for values in f:
words.append(list(values.strip().upper()))
我创建了一个函数 word_in_tiles
h/t smack89,它 return 是否有可能在给定的 tile set 中构造一个单词:
def word_in_tiles(word, tiles):
tiles_counter = collections.Counter(tiles)
return all(tiles_counter.get(ch, 0) == cnt for ch,cnt in
collections.Counter(word).items())
然后我创建了一个函数 get_all_words
,它生成一个列表,其中包含一个人可以从单词列表和拼贴集中拼写的所有可能单词。
def get_all_words(tile_set, words):
# words is a word list
return [i for i in words if word_in_tiles(i, tile_set)]
识别哪个 tileset 是三个字母的“最佳”的极其天真的方法如下:
我首先为给定的长度创建一个包含所有可能组合的列表。所以对于长度 3,我会这样做:
import string
import itertools
letters = string.ascii_lowercase
all_three_letter_combinations = list(itertools.combinations_with_replacement(letters, 3))
# Create a list of only words that are three letters are less
three_letter = [i for i in words if len(i) <= 3]
sequence_dict = dict()
for i in all_three_letter_combinations:
string_test = "".join(i).upper()
sequence_dict[i] = get_all_words(string_test, three_letter)
然后去掉没有长度的值,按照结果的长度排序:
res = {k: v for k, v in sequence_dict.items() if len(v) >= 1}
def GetMaxLength(res):
return max((len(v), v, k) for k, v in res.items())[1:]
获取最大长度(分辨率)
你明白了,对于三个字母,产生最多英语有效单词的 tile-set 是 T A E
,它可以产生以下单词 ['AE', 'AT', 'ATE', 'EAT', 'ET', 'ETA', 'TA', 'TAE', 'TEA']
我希望能够将其扩展到 N = 15。执行此操作的最佳程序是什么?
如果您使用替换进行选择,您将始终获得相同数量的组合,但使用独特字符的独特组合更多。
AA
: AA,AA,A,A
AB
: AB,BA,A,B
因为这会变得非常大,非常快,而不是静态地创建所有组合,您可以创建一个生成器来生成所有值 as-needed(作为其他计算的输入) .
import string
import itertools
def gen_substrs_with_replacement(maxlen, max_chars=26):
letters = string.ascii_lowercase[:max_chars]
for length in range(1, maxlen + 1): # start at 1 and end at N
yield from itertools.combinations_with_replacement(letters, length)
但是,如果您只想要 count 个值,您可以使用带替换的二项式公式计算它们。
请注意,如果您也想要子字符串,您实际上需要每次选择较少的系列的总和(ABCD
:AAA
AA
A
).
from math import comb as nchoosek
# N nhoose K with replacement and every sub-list
def nCk_r(source_length, selections, sublist=False):
if sublist and selections > 1:
return nchoosek(source_length + selections - 1, selections) + nCk_r(source_length, selections - 1, True)
return nchoosek(source_length + selections -1, selections)
>>> list(gen_substrs_with_replacement(2, 3))
[('a',), ('b',), ('c',), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'b'), ('b', 'c'), ('c', 'c')]
>>> len(list(gen_substrs_with_replacement(2, 3)))
9
>>> nCk_r(2,3,True)
9
>>> len(list(gen_substrs_with_replacement(12, 10)))
646645
>>> nCk_r(12,10,True)
646645
目标应该是使您的算法达到 运行 O(enablelist_size) 的顺序。那就是我们不想浪费时间生成不在列表中的单词,这样你就可以确保你的算法是高效的。
您可能应该做的第一件事是按字数划分列表。
因此,将您的 enable.txt
文件分成包含所有 2 个字母的单词、3 个字母的单词等的文件。也许将它们命名为 enable_2.txt
、enable_3.txt
等.
length_map = collections.defaultdict(list)
with open("enable.txt", "r") as f:
for word in f:
length_map[len(word)].append(word)
for n in length_map:
with open(f"enable_{n}.txt", "w+") as f:
f.write("\n".join(length_map[c]))
你不必这样做,但它确实有助于加快接下来的任务
现在,如果您有 n
个图块,您想要阅读包含 2
到 n
个字母词的所有 enable_n.txt
个文件。
如果您没有完成第一部分,那么只需阅读您拥有的 enable.txt
文件.
对于文件中的每个单词,您需要检查您的图块是否包含该单词中的所有字母。
这是一个 python 函数,您可以用它来检查:
import collections
def word_in_tiles(word, tiles):
tiles_counter = collections.Counter(tiles)
return all(tiles_counter.get(ch, 0) == cnt for ch,cnt in collections.Counter(word).items())
如果函数 returns True
任何单词,你可以打印那个单词。
这种方法比生成所有可能的图块组合要快得多。
在最坏的情况下,此方法的速度就好像您以某种方式生成了已经在 enable.txt
中的所有有效单词,然后过滤掉那些长度不正确的单词一样快。
这段代码可以在我的笔记本电脑上使用 PyPy 在几分钟内优化 n=15,发现
10701 acdegilmnoprstu.
想法是进行分支定界,在每个节点处强制包含一些字母,而排除其他字母。我们通过找到一个 order-preserving 映射 f
(保留多重集包含的偏序)从字母的多重集到更小的偏序 space,得出每个节点质量的上限,然后计算我们可以获得的单词数量,其中 f(word)
包含在最佳 f(tiles)
中。在较小的 space 上,我们可以使用快速卷积方法(让人联想到 FFT)来暴力解决问题。
为了找到一个好的space,我们贪婪地一次一个地删除字母以影响尽可能少的单词,直到上限可以被暴力破解。
import array
import collections
import functools
import heapq
def count_occurrences_of_letters(raw_word):
occurs = collections.Counter()
word = []
for letter in raw_word:
word.append(letter + str(occurs[letter]))
occurs[letter] += 1
return word
def greedy_censorship_order(words):
hits = collections.defaultdict(set)
for index, word in enumerate(words):
for letter in word:
hits[letter].add(index)
order = []
while hits:
censored_letter = min(hits.keys(), key=lambda letter: len(hits[letter]))
order.append(censored_letter)
for index in hits[censored_letter]:
for letter in words[index]:
if letter != censored_letter:
hits[letter].remove(index)
del hits[censored_letter]
return order
def bitmap_from_word(word, alphabet):
bitmap = 0
censored = 0
for letter in word:
try:
bitmap |= 1 << alphabet.index(letter)
except ValueError:
censored += 1
return bitmap, censored
def sum_over_subsets(vector, dimension):
for i in range(dimension):
bit = 1 << i
for bitmap in range(1 << dimension):
if not (bitmap & bit):
vector[bitmap | bit] += vector[bitmap]
def count_set_bits(n):
return bin(n).count("1")
@functools.total_ordering
class Node:
def __init__(self, subset, n, unfiltered_words):
self.subset = subset
self.n = n
self.words = [word for word in unfiltered_words if len(word) <= n]
self.upper_bound = sum(not word for word in self.words)
if n == 0:
return
order = greedy_censorship_order(self.words)
if not order:
self.pivot = None
return
self.pivot = order[-1]
alphabet = order[-(n + 7) :]
zeros = [0] * (1 << len(alphabet))
vectors = [array.array("l", zeros) for i in range(n + 1)]
for word in self.words:
bitmap, censored = bitmap_from_word(word, alphabet)
for i in range(censored, n + 1):
vectors[i][bitmap] += 1
for vector in vectors:
sum_over_subsets(vector, len(alphabet))
self.upper_bound = max(
vectors[n - count_set_bits(bitmap)][bitmap]
for bitmap in range(1 << len(alphabet))
if count_set_bits(bitmap) <= n
)
def children(self):
if self.pivot is None:
return
yield Node(
self.subset, self.n, [word for word in self.words if self.pivot not in word]
)
yield Node(
self.subset | {self.pivot},
self.n - 1,
[
[letter for letter in word if letter != self.pivot]
for word in self.words
],
)
def __eq__(self, other):
return self.upper_bound == other.upper_bound
def __ne__(self, other):
return self.upper_bound != other.upper_bound
def __lt__(self, other):
return self.upper_bound > other.upper_bound
def solve(n, words):
heap = [Node(set(), n, words)]
while True:
top = heapq.heappop(heap)
print(top.upper_bound, "".join(sorted(letter[0] for letter in top.subset)))
if top.n == 0:
return
for child in top.children():
heapq.heappush(heap, child)
def main():
with open("enable.txt") as file:
raw_words = file.read().split()
words = [count_occurrences_of_letters(word) for word in raw_words]
solve(15, words)
if __name__ == "__main__":
main()
我觉得这样就够了!
这是我在 PyPy 下的代码 运行 的日志:
0:00:00.000232
E
0:00:00.001251
ER
0:00:00.048733
EAT
0:00:00.208744
ESAT
0:00:00.087425
ESATL
0:00:00.132049
ESARTP
0:00:00.380296
ESARTOP
0:00:01.409129
ESIARTLP
0:00:03.433526
ESIARNTLP
0:00:10.391252
ESIARNTOLP
0:00:25.651012
ESIARNTOLDP
0:00:56.642405
ESIARNTOLCDP
0:01:57.257293
ESIARNTOLCDUP
0:03:55.933906
ESIARNTOLCDUPM
0:07:17.146036
ESIARNTOLCDUPMG
0:10:14.844347
ESIARNTOLCDUPMGH
0:13:34.722600
ESIARNTOLCDEUPMGH
0:18:14.215019
ESIARNTOLCDEUPMGSH
0:22:47.129284
ESIARNTOLCDEUPMGSHB
0:27:56.859511
ESIARNTOLCDEUPMGSHBYK
0:46:20.448502
ESIARNTOLCDEUPMGSHBYAK
0:57:15.213635
ESIARNTOLCDEUPMGSHIBYAT
1:09:55.530180
ESIARNTOLCDEUPMGSHIBYATF
1:18:35.209599
ESIARNTOLCDEUPMGSHIBYATRF
1:21:54.095119
ESIARNTOLCDEUPMGSHIBYATRFV
1:20:16.978411
ESIARNTOLCDEUPMGSHIBYAOTRFV
1:14:24.253660
ESIARNTOLCDEUPMGSHIBYAONTRFV
1:00:37.405571
关键的改进是这些。
- 我不仅区分字母,还区分字母被看过多少次。因此,我可以接受或继续前进的每封信。这是我在评论 David Eisenstat 的解决方案时产生的想法。
- 我还从他那里了解到,修剪掉无法得出答案的树可以很好地控制问题的发展。
- 我看到的第一个解决方案就是所有最上面的字母。这是一个很好的解决方案,所以尽管它是深度优先的,但我们会很好地修剪。
- 我很小心地将“用尽的尝试”合并为一条记录。这减少了我们必须丢弃的数据量。
这是代码。
import os
import datetime
path = "enable.txt"
words = []
with open(path) as f:
for values in f:
words.append(values.strip().upper())
key_count = {}
for word in words:
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
if key not in key_count:
key_count[key] = 1
else:
key_count[key] += 1
seen[letter] += 1
KEYS = sorted(key_count.keys(), key=lambda key: -key_count[key])
#print(KEYS)
#print(len(KEYS))
KEY_POS = {}
for i in range(len(KEYS)):
KEY_POS[KEYS[i]] = i
# Now we will build a trie. Every node has a list of words, and a dictionary
# from the next letter farther in the trie.
# BUT TRICK:, we will map each word to a sequence of numbers, and those numbers
# will be indexes into KEYS. This allows us to use the fact that a second 'e' is
# unlikely, so we can deal with that efficiently.
class Trie:
def __init__(self, path):
self.words = []
self.dict = {}
self.min_pos = -1
self.max_pos = -1
self.words = []
self.count_words = 0
self.path = path
def add_word (self, word):
trie = self
poses = []
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
poses.append(KEY_POS[(key)])
seen[letter] += 1
sorted_poses = sorted(poses);
for i in range(len(sorted_poses)):
trie.count_words += 1
pos = sorted_poses[i]
if pos not in trie.dict:
trie.dict[pos] = Trie(trie.path + KEYS[pos][0])
if trie.max_pos < pos:
trie.max_pos = pos
trie = trie.dict[pos]
trie.count_words += 1
trie.words.append(word)
base_trie = Trie('')
for word in words:
base_trie.add_word(word);
def best_solution (size):
def solve (subset, pos, best, partial):
found = sum(x[0] for x in partial)
upper_bound = sum(x[1] for x in partial)
if size <= len(subset) or upper_bound < best or len(KEYS) <= pos:
return (found, subset)
if best < found:
best = found
# Figure out our next calculations.
partial_include = []
partial_exclude = []
finalized_found = 0
for this_found, this_bound, this_trie in partial:
if this_trie is None:
# This is a generic record of already emptied tries
finalized_found += this_found
elif pos in this_trie.dict:
include_trie = this_trie.dict[pos]
partial_include.append((
this_found + len(include_trie.words),
include_trie.count_words + this_found,
include_trie
))
# We included the tally of found words in the previous partial.
# So do not double-count by including it again
partial_include.append((
0,
this_bound - include_trie.count_words - this_found,
this_trie
))
partial_exclude.append((
this_found,
this_bound - include_trie.count_words,
this_trie
))
elif this_found == this_bound:
finalized_found += this_found
else:
partial_include.append((
this_found,
this_bound,
this_trie
))
partial_exclude.append((
this_found,
this_bound,
this_trie
))
if 0 < finalized_found:
partial_include.append(
(finalized_found, finalized_found, None)
)
partial_exclude.append(
(finalized_found, finalized_found, None)
)
found_include, subset_include = solve(subset + [pos], pos+1, best, partial_include)
if best < found_include:
best = found_include
found_exclude, subset_exclude = solve(subset, pos+1, best, partial_exclude)
if found_include < found_exclude:
return (found_exclude, subset_exclude)
else:
return (found_include, subset_include)
count, subset = solve([], 0, 0, [(len(base_trie.words), base_trie.count_words, base_trie)])
return ''.join([KEYS[x][0] for x in subset])
for i in range(20):
start = datetime.datetime.now()
print(best_solution(i))
print(datetime.datetime.now() - start)
这里是一个“笨蛋”sum-over-subsets,它会在 33 秒内从 1 到 26 的每次计数中累积不同字母的选择,从而在文件“enable.txt”中产生最多的单词在我的笔记本电脑上。 (32 秒是一个加速 ,将我原来的代码 运行 在 6 分 45 秒内更改为 in-place 方法)。
由于 btilly 和 David Eisenstat 已经完成了优化搜索的艰巨工作,其中还包括重复项,我们知道此处最多 16 个字母的信息很有用。
from collections import defaultdict
def as_number(word):
word = word.lower();
n = 0
for c in word:
m = ord(c) - 97
if n & (1 << m):
return 0
else:
n |= 1 << m
return n
def get_letters(n):
letters = ""
i = 0
while n:
if n & 1:
letters += chr(97 + i)
n >>= 1
i += 1
return letters
def f(words, N):
hash = defaultdict(lambda: 0) #[0] * (1 << N)
for w in words:
num = as_number(w)
if num:
hash[num] += 1 #= -~hash[num]
dp = [hash.get(mask, 0) for mask in range(1 << N)]
for i in range(N):
for mask in range(1 << N):
if not (mask & (1 << i)):
dp[mask ^ (1 << i)] += dp[mask]
result = {}
for i in xrange(1, 1 << N):
k = bin(i).count("1")
if k in result:
if result[k]["best"] == dp[i]:
result[k]["best_letters"].append(get_letters(i))
elif result[k]["best"] < dp[i]:
result[k]["best"] = dp[i]
result[k]["best_letters"] = [get_letters(i)]
elif dp[i]:
result[k] = {
"best": dp[i],
"best_letters": [get_letters(i)]
}
return result
import os
import datetime
path = "enable.txt"
words = []
with open(path) as file:
for values in file:
words.append(values.strip())
start = datetime.datetime.now()
print f(words, 26)
print(datetime.datetime.now() - start)
输出:
// ♥ pypy py.py
{
2: {
'best': 2,
'best_letters': ['ab', 'de', 'ah', 'eh', 'al', 'am', 'em', 'an', 'en', 'do', 'ho', 'mo', 'no', 'er', 'is', 'os', 'at', 'it', 'mu', 'nu', 'ow', 'ay', 'oy']
},
3: {
'best': 9,
'best_letters': ['aet']
},
4: {
'best': 24,
'best_letters': ['aest']
},
5: {
'best': 66,
'best_letters': ['aelst']
},
6: {
'best': 150,
'best_letters': ['aeprst']
},
7: {
'best': 283,
'best_letters': ['aeoprst']
},
8: {
'best': 543,
'best_letters': ['aeilprst']
},
9: {
'best': 945,
'best_letters': ['aeilnprst']
},
10: {
'best': 1590,
'best_letters': ['aeilnoprst']
},
11: {
'best': 2557,
'best_letters': ['adeilnoprst']
},
12: {
'best': 3855,
'best_letters': ['acdeilnoprst']
},
13: {
'best': 5648,
'best_letters': ['acdeilnoprstu']
},
14: {
'best': 8001,
'best_letters': ['acdeilmnoprstu']
},
15: {
'best': 10701,
'best_letters': ['acdegilmnoprstu']
},
16: {
'best': 14060,
'best_letters': ['acdeghilmnoprstu']
},
17: {
'best': 17225,
'best_letters': ['abcdeghilmnoprstu']
},
18: {
'best': 20696,
'best_letters': ['abcdeghilmnoprstuy']
},
19: {
'best': 23723,
'best_letters': ['abcdeghiklmnoprstuy']
},
20: {
'best': 26542,
'best_letters': ['abcdefghiklmnoprstuy']
},
21: {
'best': 29501,
'best_letters': ['abcdefghiklmnoprstuwy']
},
22: {
'best': 31717,
'best_letters': ['abcdefghiklmnoprstuvwy']
},
23: {
'best': 32675,
'best_letters': ['abcdefghiklmnoprstuvwyz']
},
24: {
'best': 33548,
'best_letters': ['abcdefghiklmnoprstuvwxyz']
},
25: {
'best': 34299,
'best_letters': ['abcdefghijklmnoprstuvwxyz']
},
26: {
'best': 34816,
'best_letters': ['abcdefghijklmnopqrstuvwxyz']
}
}
0:00:32.295888
这是我之前的回答,添加了另一个借鉴自 David Eisenstat 的想法。这个想法是从一种数据结构开始的,这种数据结构可以消除由普通字母引起的差异。提速还不错。
这是 PyPy 下代码 运行 的完整日志。一旦你走得很远,它就会非常非常快,因为数据结构非常小。最慢的是 23 个字母。
0:00:00.000226
0:00:00.000095
ER
0:00:00.000514
EAT
0:00:00.014545
ESAT
0:00:00.069521
ESATL
0:00:00.063586
ESARTP
0:00:00.100273
ESARTOP
0:00:00.309665
ESIARTLP
0:00:01.171279
ESIARNTLP
0:00:02.435968
ESIARNTOLP
0:00:05.049897
ESIARNTOLDP
0:00:11.358067
ESIARNTOLCDP
0:00:23.378628
ESIARNTOLCDUP
0:00:38.672561
ESIARNTOLCDUPM
0:00:57.530691
ESIARNTOLCDUPMG
0:01:23.665592
ESIARNTOLCDUPMGH
0:01:43.571878
ESIARNTOLCDEUPMGH
0:02:02.831546
ESIARNTOLCDEUPMGSH
0:02:13.175045
ESIARNTOLCDEUPMGSHB
0:02:15.644423
ESIARNTOLCDEUPMGSHBY
0:02:25.681542
ESIARNTOLCDEUPMGSHBYK
0:02:38.858906
ESIARNTOLCDEUPMGSHBYAK
0:02:44.703865
ESIARNTOLCDEUPMGSHIBYAT
0:02:52.872105
ESIARNTOLCDEUPMGSHIBYATF
0:02:38.724424
ESIARNTOLCDEUPMGSHIBYATRF
0:02:19.892021
ESIARNTOLCDEUPMGSHIBYATRFV
0:02:01.069136
ESIARNTOLCDEUPMGSHIBYAOTRFV
0:01:39.992728
ESIARNTOLCDEUPMGSHIBYAONTRFV
0:01:09.800219
ESIARNTOLCDEUPMGSHIBYAONTRFVK
0:00:46.791619
ESIARNTOLCDEUPMGSHIBYAONTRFVKW
0:00:30.404072
ESIARNTOLCDEUPMGSHIBYAONTRFVLKW
0:00:17.932808
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWC
0:00:16.610114
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWEC
0:00:13.385111
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWECZ
0:00:11.731393
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZ
0:00:09.263752
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZD
0:00:08.459759
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZDP
0:00:06.778232
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZDPX
0:00:05.502753
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPX
0:00:05.504382
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXM
0:00:03.345941
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMU
0:00:02.786702
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUG
0:00:02.258980
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGQ
0:00:02.181486
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGQJ
0:00:02.069378
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGOHQ
0:00:01.678280
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGOHQJ
0:00:01.758940
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJ
0:00:01.015172
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJA
0:00:00.782816
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJAB
0:00:00.590803
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJABF
0:00:00.560564
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJABFT
0:00:00.789469
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFT
0:00:00.213274
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTR
0:00:00.142143
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTER
0:00:00.117293
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERL
0:00:00.062241
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLY
0:00:00.044667
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIY
0:00:00.032022
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYK
0:00:00.033368
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKC
0:00:00.028749
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCV
0:00:00.028905
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVW
0:00:00.038960
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVWZ
0:00:00.032474
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZ
0:00:00.031382
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZD
0:00:00.032725
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDG
0:00:00.028246
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGO
0:00:00.026071
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGON
0:00:00.023227
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONM
0:00:00.019831
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMP
0:00:00.015209
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPU
0:00:00.011805
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUE
0:00:00.013564
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEI
0:00:00.011815
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIA
0:00:00.005378
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIAB
0:00:00.013570
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABL
0:00:00.004920
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLH
0:00:00.002972
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHT
0:00:00.002923
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTF
0:00:00.003162
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSF
0:00:00.004121
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFR
0:00:00.004218
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJ
0:00:00.005604
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJC
0:00:00.006316
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCY
0:00:00.012737
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYN
0:00:00.010365
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNO
0:00:00.011111
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOI
0:00:00.013791
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIG
0:00:00.016119
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGD
0:00:00.017771
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDX
0:00:00.015684
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXW
0:00:00.014175
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZW
0:00:00.015253
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWU
0:00:00.009111
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEU
0:00:00.009990
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUK
0:00:00.013182
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKQ
0:00:00.003639
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKQV
0:00:00.002877
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQV
0:00:00.001968
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVM
0:00:00.001155
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMA
0:00:00.000890
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAP
0:00:00.000592
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPF
0:00:00.000894
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFB
0:00:00.000512
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTB
0:00:00.000452
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBE
0:00:00.000365
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEI
0:00:00.000277
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEIZ
0:00:00.000368
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEIZK
0:00:00.000321
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEIZKL
0:00:00.000275
这是代码
import re
import os
import datetime
path = "enable.txt"
words = []
with open(path) as f:
for values in f:
words.append(values.strip().upper())
key_count = {}
for word in words:
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
if key not in key_count:
key_count[key] = 1
else:
key_count[key] += 1
seen[letter] += 1
KEYS = sorted(key_count.keys(), key=lambda key: -key_count[key])
#print(KEYS)
#print(len(KEYS))
KEY_POS = {}
for i in range(len(KEYS)):
KEY_POS[KEYS[i]] = i
KEY_POS[KEYS[i]] = i
# Now we will build a trie. Every node has a list of words, and a dictionary
# from the next letter farther in the trie.
# BUT TRICK:, we will map each word to a sequence of numbers, and those numbers
# will be indexes into KEYS. This allows us to use the fact that a second 'e' is
# unlikely to store that efficiently.
class Trie:
def __init__(self, path):
self.words = []
self.dict = {}
self.words_len = 0
self.count_words = 0
self.path = path
def add_word (self, word):
trie = self
poses = []
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
poses.append(KEY_POS[(key)])
seen[letter] += 1
sorted_poses = sorted(poses);
for i in range(len(sorted_poses)):
trie.count_words += 1
pos = sorted_poses[i]
if pos not in trie.dict:
trie.dict[pos] = Trie(trie.path + KEYS[pos][0])
trie = trie.dict[pos]
trie.count_words += 1
trie.words_len += 1
trie.words.append(word)
def remove (self, pos):
trie = Trie(self.path)
trie.words = None
trie.dict = self.dict.copy()
trie.words_len = self.words_len
trie.count_words = self.count_words
trie.path = self.path
if pos in trie.dict:
trie.count_words -= trie.dict[pos].count_words
trie.dict.pop(pos)
return trie
def merge (self, other):
trie = Trie(self.path)
trie.words = None
trie.words_len = self.words_len + other.words_len
trie.count_words = self.count_words + other.count_words
trie.path = self.path
trie.dict = self.dict.copy()
for pos, subtrie in other.dict.items():
if pos in trie.dict:
trie.dict[pos] = trie.dict[pos].merge(subtrie)
else:
trie.dict[pos] = subtrie
return trie
def __repr__ (self):
answer = self.path + ":\n words_len: " + str(self.words_len) + "\n count_words:" + str(self.count_words) + " \n dict:\n"
def indent (string):
p = re.compile("^", re.M)
return p.sub(" ", string)
for pos in sorted(self.dict.keys()):
subtrie = self.dict[pos]
answer = answer + indent(str(pos) + ":\n" + subtrie.__repr__())
return answer
base_trie = Trie('')
for word in words:
base_trie.add_word(word)
base_tries = [base_trie]
last_trie = base_trie
for i in range(len(KEYS)):
subtrie = last_trie.dict[i]
last_trie = last_trie.remove(i).merge(subtrie)
base_tries.append(last_trie)
#print(base_tries[1].__repr__())
def best_solution (size):
def solve (subset, pos, best, partial):
found = sum(x[0] for x in partial)
upper_bound = sum(x[1] for x in partial)
if size <= len(subset) or upper_bound < best or len(KEYS) <= pos:
return (found, subset)
if best < found:
best = found
# Figure out our next calculations.
partial_include = []
partial_exclude = []
finalized_found = 0
for this_found, this_bound, this_trie in partial:
if this_trie is None:
# This is a generic record of already emptied tries
finalized_found += this_found
elif pos in this_trie.dict:
include_trie = this_trie.dict[pos]
partial_include.append((
this_found + include_trie.words_len,
include_trie.count_words + this_found,
include_trie
))
# We included the tally of found words in the previous partial.
# So do not double-count by including it again
partial_include.append((
0,
this_bound - include_trie.count_words - this_found,
this_trie
))
partial_exclude.append((
this_found,
this_bound - include_trie.count_words,
this_trie
))
elif this_found == this_bound:
finalized_found += this_found
else:
partial_include.append((
this_found,
this_bound,
this_trie
))
partial_exclude.append((
this_found,
this_bound,
this_trie
))
if 0 < finalized_found:
partial_include.append(
(finalized_found, finalized_found, None)
)
partial_exclude.append(
(finalized_found, finalized_found, None)
)
found_include, subset_include = solve(subset + [pos], pos+1, best, partial_include)
if best < found_include:
best = found_include
found_exclude, subset_exclude = solve(subset, pos+1, best, partial_exclude)
if found_include < found_exclude:
return (found_exclude, subset_exclude)
else:
return (found_include, subset_include)
best_count = 0
best_subset = []
start_subset = [i for i in range(size)]
while 0 < len(start_subset):
trie = base_tries[len(start_subset)].remove(start_subset[-1]+1)
count, subset = solve(list(start_subset), start_subset[-1]+2, best_count, [(trie.words_len, trie.count_words, trie)])
if best_count < count:
best_count = count
best_subset = subset
start_subset.pop()
return ''.join([KEYS[x][0] for x in best_subset])
for i in range(len(KEYS):
start = datetime.datetime.now()
print(best_solution(i))
print(datetime.datetime.now() - start)
这是一种方法,returns 结果会在 中列出,但将每个结果的执行时间限制为 20 秒(长度 22 除外,并且可能有几个其他异常值):) ^^
该方法使用位集并确定单词列表中字母频率的优先级(由 201 位字母位集散列以包括重复字母),通过仅包含没有新排除的单词哈希来划分相关单词列表字母.
这是一个提前退出递归的 hack,因为贪婪的分支看起来很倾斜。我欢迎利用这里的想法的方法(尽管如此,我还是更仔细地研究了 btilly 和 David Eisenstat 的回答)。
from collections import defaultdict
import time
def as_number(word):
word = word.lower();
n = 0
for c in word:
m = ord(c) - 97
if n & (1 << m):
return 0
else:
n |= 1 << m
return n
def as_number_with_duplicates(word):
counts = defaultdict(lambda: 0)
word = word.lower();
n = 0
for c in word:
m = ord(c) - 97
n |= 1 << (counts[c] * 26 + m)
counts[c] += 1
return n
def get_letters(n):
letters = ""
i = 0
while n:
if n & 1:
letters += chr(97 + i % 26)
n >>= 1
i += 1
return letters
def add_to_freq(freq, bits):
i = 0
while bits:
if bits & 1:
freq[i] += 1
bits >>= 1
i += 1
def arrange_freq(freq):
return sorted([(x, i) for i, x in enumerate(freq)])
def get_tuple(tpl, i, k, seen, num_to_count_map):
bits = tpl[2]["bits"]
freq = [0] * len(tpl[2]["freq"])
nums = tpl[2]["nums"]
bit = 1 << i
new_bits = bits ^ bit
if new_bits in seen:
return None
seen.add(new_bits)
new_nums = []
count = 0
for num in nums:
if not (num & bit):
new_nums.append(num)
count += num_to_count_map[num]
add_to_freq(freq, num)
return (count, tpl[1] - 1, {
"bits": new_bits,
"freq": arrange_freq(freq),
"nums": new_nums
})
def upper_index(a, val, left, right):
if left == right:
return left
mid = (left + right + 1) // 2
if a[mid][0] > val:
return upper_index(a, val, left, mid-1)
else:
return upper_index(a, val, mid, right)
# tpl = (word_count, letter_count, {bits, freq, nums})
def solve(tpl, k, seen, num_to_count_map, global_best):
if tpl[1] == k:
print "cand: %s, %s, %s" % (get_letters(tpl[2]["bits"]), tpl[1], tpl[0])
if tpl[0] > global_best["count"]:
global_best["count"] = tpl[0]
global_best["bits"] = tpl[2]["bits"]
if tpl[1] > k:
freq = tpl[2]["freq"]
start = 1 + upper_index(freq, 0, 0, len(freq))
for idx in xrange(start, len(freq)):
if time.time() > global_best["end"]:
return
_freq, i = freq[idx]
new_tpl = get_tuple(tpl, i, k, seen, num_to_count_map)
# Prune
if not new_tpl or new_tpl[0] < global_best["count"]:
continue
solve(new_tpl, k, seen, num_to_count_map, global_best)
def f(words, k):
num_to_count_map = defaultdict(lambda: 0)
nums = set()
all_bits = 0
freq = []
for w in words:
if len(w) > k:
continue
num = as_number_with_duplicates(w)
nums.add(num)
num_bits = len(bin(num)[2:])
if len(freq) < num_bits:
freq.extend([0] * (num_bits - len(freq)))
add_to_freq(freq, num)
num_to_count_map[num] += 1
all_bits |= num
tpl = (len(words), bin(all_bits).count("1"), {"bits": all_bits, "freq": arrange_freq(freq), "nums": nums})
global_best = {
"count": -float('inf'),
"bits": 0,
"end": time.time() + (5*60 if k == 22 else 20)
}
solve(tpl, k, set([all_bits]), num_to_count_map, global_best)
return global_best
def format(w):
return "".join(sorted(w.lower()))
def are_equal(w1, w2):
return format(w1) == format(w2)
"""
import os
import datetime
path = "enable.txt"
words = []
with open(path) as file:
for values in file:
words.append(values.strip())
start = datetime.datetime.now()
for i in xrange(2, 31):
result = f(words, i)
formatted = format(get_letters(result["bits"]))
print result["count"], formatted
print are_equal(formatted, btilly[i])
print(datetime.datetime.now() - start)
"""
# Code by btilly
import re
import os
import datetime
path = "enable.txt"
words = []
with open(path) as file:
for values in file:
words.append(values.strip().lower())
key_count = {}
for word in words:
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
if key not in key_count:
key_count[key] = 1
else:
key_count[key] += 1
seen[letter] += 1
KEYS = sorted(key_count.keys(), key=lambda key: -key_count[key])
#print(KEYS)
#print(len(KEYS))
KEY_POS = {}
for i in range(len(KEYS)):
KEY_POS[KEYS[i]] = i
KEY_POS[KEYS[i]] = i
# Now we will build a trie. Every node has a list of words, and a dictionary
# from the next letter farther in the trie.
# BUT TRICK:, we will map each word to a sequence of numbers, and those numbers
# will be indexes into KEYS. This allows us to use the fact that a second 'e' is
# unlikely to store that efficiently.
class Trie:
def __init__(self, path):
self.words = []
self.dict = {}
self.words_len = 0
self.count_words = 0
self.path = path
def add_word (self, word):
trie = self
poses = []
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
poses.append(KEY_POS[(key)])
seen[letter] += 1
sorted_poses = sorted(poses);
for i in range(len(sorted_poses)):
trie.count_words += 1
pos = sorted_poses[i]
if pos not in trie.dict:
trie.dict[pos] = Trie(trie.path + KEYS[pos][0])
trie = trie.dict[pos]
trie.count_words += 1
trie.words_len += 1
trie.words.append(word)
def remove (self, pos):
trie = Trie(self.path)
trie.words = None
trie.dict = self.dict.copy()
trie.words_len = self.words_len
trie.count_words = self.count_words
trie.path = self.path
if pos in trie.dict:
trie.count_words -= trie.dict[pos].count_words
trie.dict.pop(pos)
return trie
def merge (self, other):
trie = Trie(self.path)
trie.words = None
trie.words_len = self.words_len + other.words_len
trie.count_words = self.count_words + other.count_words
trie.path = self.path
trie.dict = self.dict.copy()
for pos, subtrie in other.dict.items():
if pos in trie.dict:
trie.dict[pos] = trie.dict[pos].merge(subtrie)
else:
trie.dict[pos] = subtrie
return trie
def __repr__ (self):
answer = self.path + ":\n words_len: " + str(self.words_len) + "\n count_words:" + str(self.count_words) + " \n dict:\n"
def indent (string):
p = re.compile("^", re.M)
return p.sub(" ", string)
for pos in sorted(self.dict.keys()):
subtrie = self.dict[pos]
answer = answer + indent(str(pos) + ":\n" + subtrie.__repr__())
return answer
base_trie = Trie('')
for word in words:
base_trie.add_word(word)
base_tries = [base_trie]
last_trie = base_trie
for i in range(len(KEYS)):
subtrie = last_trie.dict[i]
last_trie = last_trie.remove(i).merge(subtrie)
base_tries.append(last_trie)
#print(base_tries[1].__repr__())
def best_solution (size):
def solve (subset, pos, best, partial):
found = sum(x[0] for x in partial)
upper_bound = sum(x[1] for x in partial)
if size <= len(subset) or upper_bound < best or len(KEYS) <= pos:
return (found, subset)
if best < found:
best = found
# Figure out our next calculations.
partial_include = []
partial_exclude = []
finalized_found = 0
for this_found, this_bound, this_trie in partial:
if this_trie is None:
# This is a generic record of already emptied tries
finalized_found += this_found
elif pos in this_trie.dict:
include_trie = this_trie.dict[pos]
partial_include.append((
this_found + include_trie.words_len,
include_trie.count_words + this_found,
include_trie
))
# We included the tally of found words in the previous partial.
# So do not double-count by including it again
partial_include.append((
0,
this_bound - include_trie.count_words - this_found,
this_trie
))
partial_exclude.append((
this_found,
this_bound - include_trie.count_words,
this_trie
))
elif this_found == this_bound:
finalized_found += this_found
else:
partial_include.append((
this_found,
this_bound,
this_trie
))
partial_exclude.append((
this_found,
this_bound,
this_trie
))
if 0 < finalized_found:
partial_include.append(
(finalized_found, finalized_found, None)
)
partial_exclude.append(
(finalized_found, finalized_found, None)
)
found_include, subset_include = solve(subset + [pos], pos+1, best, partial_include)
if best < found_include:
best = found_include
found_exclude, subset_exclude = solve(subset, pos+1, best, partial_exclude)
if found_include < found_exclude:
return (found_exclude, subset_exclude)
else:
return (found_include, subset_include)
best_count = 0
best_subset = []
start_subset = [i for i in range(size)]
while 0 < len(start_subset):
trie = base_tries[len(start_subset)].remove(start_subset[-1]+1)
count, subset = solve(list(start_subset), start_subset[-1]+2, best_count, [(trie.words_len, trie.count_words, trie)])
if best_count < count:
best_count = count
best_subset = subset
start_subset.pop()
return ''.join([KEYS[x][0] for x in best_subset])
# End code by btilly
for i in range(25, len(KEYS)):
start = datetime.datetime.now()
btilly = best_solution(i)
print btilly
print(datetime.datetime.now() - start)
start = datetime.datetime.now()
result = f(words, i)
formatted = format(get_letters(result["bits"]))
print formatted
print(datetime.datetime.now() - start)
print are_equal(btilly, formatted)
print ""
我正在尝试创建一个函数 best_tiles
,它接收您手中的牌数和 return 允许您生成最多独特英语的牌组- 有效的单词,假设你只能使用每个方块一次。
例如,用你手上的那组牌(A, B, C)
你可以拼出CAB、BAC、AB、BA(都是英文单词),所以你可以拼出4个独特的与那套话。使用 (B, A, A)
,您可以拼出 5 个单词:ABA、BAA、AA、AB 和 BA。目标是找到一组字母,使您可以拼出最多的英语有效单词(无需替换)。
因此,如果 5 是 N = 3 的任意字母组合可以拼出的最大单词数,则 运行 best_tiles( n = 3 )
将 return B, A, A
。
我想知道如何有效地实现它?我目前的方法根本无法很好地适应字母数量。
我读了一个单词表。在这种情况下,我在这里使用 enable.txt:https://www.wordgamedictionary.com/enable/
import os
path = "enable.txt"
words = []
with open(path , encoding='utf8') as f:
for values in f:
words.append(list(values.strip().upper()))
我创建了一个函数 word_in_tiles
h/t smack89,它 return 是否有可能在给定的 tile set 中构造一个单词:
def word_in_tiles(word, tiles):
tiles_counter = collections.Counter(tiles)
return all(tiles_counter.get(ch, 0) == cnt for ch,cnt in
collections.Counter(word).items())
然后我创建了一个函数 get_all_words
,它生成一个列表,其中包含一个人可以从单词列表和拼贴集中拼写的所有可能单词。
def get_all_words(tile_set, words):
# words is a word list
return [i for i in words if word_in_tiles(i, tile_set)]
识别哪个 tileset 是三个字母的“最佳”的极其天真的方法如下:
我首先为给定的长度创建一个包含所有可能组合的列表。所以对于长度 3,我会这样做:
import string
import itertools
letters = string.ascii_lowercase
all_three_letter_combinations = list(itertools.combinations_with_replacement(letters, 3))
# Create a list of only words that are three letters are less
three_letter = [i for i in words if len(i) <= 3]
sequence_dict = dict()
for i in all_three_letter_combinations:
string_test = "".join(i).upper()
sequence_dict[i] = get_all_words(string_test, three_letter)
然后去掉没有长度的值,按照结果的长度排序:
res = {k: v for k, v in sequence_dict.items() if len(v) >= 1}
def GetMaxLength(res):
return max((len(v), v, k) for k, v in res.items())[1:]
获取最大长度(分辨率)
你明白了,对于三个字母,产生最多英语有效单词的 tile-set 是 T A E
,它可以产生以下单词 ['AE', 'AT', 'ATE', 'EAT', 'ET', 'ETA', 'TA', 'TAE', 'TEA']
我希望能够将其扩展到 N = 15。执行此操作的最佳程序是什么?
如果您使用替换进行选择,您将始终获得相同数量的组合,但使用独特字符的独特组合更多。
AA
:AA,AA,A,A
AB
:AB,BA,A,B
因为这会变得非常大,非常快,而不是静态地创建所有组合,您可以创建一个生成器来生成所有值 as-needed(作为其他计算的输入) .
import string
import itertools
def gen_substrs_with_replacement(maxlen, max_chars=26):
letters = string.ascii_lowercase[:max_chars]
for length in range(1, maxlen + 1): # start at 1 and end at N
yield from itertools.combinations_with_replacement(letters, length)
但是,如果您只想要 count 个值,您可以使用带替换的二项式公式计算它们。
请注意,如果您也想要子字符串,您实际上需要每次选择较少的系列的总和(ABCD
:AAA
AA
A
).
from math import comb as nchoosek
# N nhoose K with replacement and every sub-list
def nCk_r(source_length, selections, sublist=False):
if sublist and selections > 1:
return nchoosek(source_length + selections - 1, selections) + nCk_r(source_length, selections - 1, True)
return nchoosek(source_length + selections -1, selections)
>>> list(gen_substrs_with_replacement(2, 3))
[('a',), ('b',), ('c',), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'b'), ('b', 'c'), ('c', 'c')]
>>> len(list(gen_substrs_with_replacement(2, 3)))
9
>>> nCk_r(2,3,True)
9
>>> len(list(gen_substrs_with_replacement(12, 10)))
646645
>>> nCk_r(12,10,True)
646645
目标应该是使您的算法达到 运行 O(enablelist_size) 的顺序。那就是我们不想浪费时间生成不在列表中的单词,这样你就可以确保你的算法是高效的。
您可能应该做的第一件事是按字数划分列表。
因此,将您的
enable.txt
文件分成包含所有 2 个字母的单词、3 个字母的单词等的文件。也许将它们命名为enable_2.txt
、enable_3.txt
等.length_map = collections.defaultdict(list) with open("enable.txt", "r") as f: for word in f: length_map[len(word)].append(word) for n in length_map: with open(f"enable_{n}.txt", "w+") as f: f.write("\n".join(length_map[c]))
你不必这样做,但它确实有助于加快接下来的任务
现在,如果您有
n
个图块,您想要阅读包含2
到n
个字母词的所有enable_n.txt
个文件。如果您没有完成第一部分,那么只需阅读您拥有的
enable.txt
文件.对于文件中的每个单词,您需要检查您的图块是否包含该单词中的所有字母。
这是一个 python 函数,您可以用它来检查:
import collections def word_in_tiles(word, tiles): tiles_counter = collections.Counter(tiles) return all(tiles_counter.get(ch, 0) == cnt for ch,cnt in collections.Counter(word).items())
如果函数 returns
True
任何单词,你可以打印那个单词。
这种方法比生成所有可能的图块组合要快得多。
在最坏的情况下,此方法的速度就好像您以某种方式生成了已经在 enable.txt
中的所有有效单词,然后过滤掉那些长度不正确的单词一样快。
这段代码可以在我的笔记本电脑上使用 PyPy 在几分钟内优化 n=15,发现
10701 acdegilmnoprstu.
想法是进行分支定界,在每个节点处强制包含一些字母,而排除其他字母。我们通过找到一个 order-preserving 映射 f
(保留多重集包含的偏序)从字母的多重集到更小的偏序 space,得出每个节点质量的上限,然后计算我们可以获得的单词数量,其中 f(word)
包含在最佳 f(tiles)
中。在较小的 space 上,我们可以使用快速卷积方法(让人联想到 FFT)来暴力解决问题。
为了找到一个好的space,我们贪婪地一次一个地删除字母以影响尽可能少的单词,直到上限可以被暴力破解。
import array
import collections
import functools
import heapq
def count_occurrences_of_letters(raw_word):
occurs = collections.Counter()
word = []
for letter in raw_word:
word.append(letter + str(occurs[letter]))
occurs[letter] += 1
return word
def greedy_censorship_order(words):
hits = collections.defaultdict(set)
for index, word in enumerate(words):
for letter in word:
hits[letter].add(index)
order = []
while hits:
censored_letter = min(hits.keys(), key=lambda letter: len(hits[letter]))
order.append(censored_letter)
for index in hits[censored_letter]:
for letter in words[index]:
if letter != censored_letter:
hits[letter].remove(index)
del hits[censored_letter]
return order
def bitmap_from_word(word, alphabet):
bitmap = 0
censored = 0
for letter in word:
try:
bitmap |= 1 << alphabet.index(letter)
except ValueError:
censored += 1
return bitmap, censored
def sum_over_subsets(vector, dimension):
for i in range(dimension):
bit = 1 << i
for bitmap in range(1 << dimension):
if not (bitmap & bit):
vector[bitmap | bit] += vector[bitmap]
def count_set_bits(n):
return bin(n).count("1")
@functools.total_ordering
class Node:
def __init__(self, subset, n, unfiltered_words):
self.subset = subset
self.n = n
self.words = [word for word in unfiltered_words if len(word) <= n]
self.upper_bound = sum(not word for word in self.words)
if n == 0:
return
order = greedy_censorship_order(self.words)
if not order:
self.pivot = None
return
self.pivot = order[-1]
alphabet = order[-(n + 7) :]
zeros = [0] * (1 << len(alphabet))
vectors = [array.array("l", zeros) for i in range(n + 1)]
for word in self.words:
bitmap, censored = bitmap_from_word(word, alphabet)
for i in range(censored, n + 1):
vectors[i][bitmap] += 1
for vector in vectors:
sum_over_subsets(vector, len(alphabet))
self.upper_bound = max(
vectors[n - count_set_bits(bitmap)][bitmap]
for bitmap in range(1 << len(alphabet))
if count_set_bits(bitmap) <= n
)
def children(self):
if self.pivot is None:
return
yield Node(
self.subset, self.n, [word for word in self.words if self.pivot not in word]
)
yield Node(
self.subset | {self.pivot},
self.n - 1,
[
[letter for letter in word if letter != self.pivot]
for word in self.words
],
)
def __eq__(self, other):
return self.upper_bound == other.upper_bound
def __ne__(self, other):
return self.upper_bound != other.upper_bound
def __lt__(self, other):
return self.upper_bound > other.upper_bound
def solve(n, words):
heap = [Node(set(), n, words)]
while True:
top = heapq.heappop(heap)
print(top.upper_bound, "".join(sorted(letter[0] for letter in top.subset)))
if top.n == 0:
return
for child in top.children():
heapq.heappush(heap, child)
def main():
with open("enable.txt") as file:
raw_words = file.read().split()
words = [count_occurrences_of_letters(word) for word in raw_words]
solve(15, words)
if __name__ == "__main__":
main()
我觉得这样就够了!
这是我在 PyPy 下的代码 运行 的日志:
0:00:00.000232
E
0:00:00.001251
ER
0:00:00.048733
EAT
0:00:00.208744
ESAT
0:00:00.087425
ESATL
0:00:00.132049
ESARTP
0:00:00.380296
ESARTOP
0:00:01.409129
ESIARTLP
0:00:03.433526
ESIARNTLP
0:00:10.391252
ESIARNTOLP
0:00:25.651012
ESIARNTOLDP
0:00:56.642405
ESIARNTOLCDP
0:01:57.257293
ESIARNTOLCDUP
0:03:55.933906
ESIARNTOLCDUPM
0:07:17.146036
ESIARNTOLCDUPMG
0:10:14.844347
ESIARNTOLCDUPMGH
0:13:34.722600
ESIARNTOLCDEUPMGH
0:18:14.215019
ESIARNTOLCDEUPMGSH
0:22:47.129284
ESIARNTOLCDEUPMGSHB
0:27:56.859511
ESIARNTOLCDEUPMGSHBYK
0:46:20.448502
ESIARNTOLCDEUPMGSHBYAK
0:57:15.213635
ESIARNTOLCDEUPMGSHIBYAT
1:09:55.530180
ESIARNTOLCDEUPMGSHIBYATF
1:18:35.209599
ESIARNTOLCDEUPMGSHIBYATRF
1:21:54.095119
ESIARNTOLCDEUPMGSHIBYATRFV
1:20:16.978411
ESIARNTOLCDEUPMGSHIBYAOTRFV
1:14:24.253660
ESIARNTOLCDEUPMGSHIBYAONTRFV
1:00:37.405571
关键的改进是这些。
- 我不仅区分字母,还区分字母被看过多少次。因此,我可以接受或继续前进的每封信。这是我在评论 David Eisenstat 的解决方案时产生的想法。
- 我还从他那里了解到,修剪掉无法得出答案的树可以很好地控制问题的发展。
- 我看到的第一个解决方案就是所有最上面的字母。这是一个很好的解决方案,所以尽管它是深度优先的,但我们会很好地修剪。
- 我很小心地将“用尽的尝试”合并为一条记录。这减少了我们必须丢弃的数据量。
这是代码。
import os
import datetime
path = "enable.txt"
words = []
with open(path) as f:
for values in f:
words.append(values.strip().upper())
key_count = {}
for word in words:
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
if key not in key_count:
key_count[key] = 1
else:
key_count[key] += 1
seen[letter] += 1
KEYS = sorted(key_count.keys(), key=lambda key: -key_count[key])
#print(KEYS)
#print(len(KEYS))
KEY_POS = {}
for i in range(len(KEYS)):
KEY_POS[KEYS[i]] = i
# Now we will build a trie. Every node has a list of words, and a dictionary
# from the next letter farther in the trie.
# BUT TRICK:, we will map each word to a sequence of numbers, and those numbers
# will be indexes into KEYS. This allows us to use the fact that a second 'e' is
# unlikely, so we can deal with that efficiently.
class Trie:
def __init__(self, path):
self.words = []
self.dict = {}
self.min_pos = -1
self.max_pos = -1
self.words = []
self.count_words = 0
self.path = path
def add_word (self, word):
trie = self
poses = []
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
poses.append(KEY_POS[(key)])
seen[letter] += 1
sorted_poses = sorted(poses);
for i in range(len(sorted_poses)):
trie.count_words += 1
pos = sorted_poses[i]
if pos not in trie.dict:
trie.dict[pos] = Trie(trie.path + KEYS[pos][0])
if trie.max_pos < pos:
trie.max_pos = pos
trie = trie.dict[pos]
trie.count_words += 1
trie.words.append(word)
base_trie = Trie('')
for word in words:
base_trie.add_word(word);
def best_solution (size):
def solve (subset, pos, best, partial):
found = sum(x[0] for x in partial)
upper_bound = sum(x[1] for x in partial)
if size <= len(subset) or upper_bound < best or len(KEYS) <= pos:
return (found, subset)
if best < found:
best = found
# Figure out our next calculations.
partial_include = []
partial_exclude = []
finalized_found = 0
for this_found, this_bound, this_trie in partial:
if this_trie is None:
# This is a generic record of already emptied tries
finalized_found += this_found
elif pos in this_trie.dict:
include_trie = this_trie.dict[pos]
partial_include.append((
this_found + len(include_trie.words),
include_trie.count_words + this_found,
include_trie
))
# We included the tally of found words in the previous partial.
# So do not double-count by including it again
partial_include.append((
0,
this_bound - include_trie.count_words - this_found,
this_trie
))
partial_exclude.append((
this_found,
this_bound - include_trie.count_words,
this_trie
))
elif this_found == this_bound:
finalized_found += this_found
else:
partial_include.append((
this_found,
this_bound,
this_trie
))
partial_exclude.append((
this_found,
this_bound,
this_trie
))
if 0 < finalized_found:
partial_include.append(
(finalized_found, finalized_found, None)
)
partial_exclude.append(
(finalized_found, finalized_found, None)
)
found_include, subset_include = solve(subset + [pos], pos+1, best, partial_include)
if best < found_include:
best = found_include
found_exclude, subset_exclude = solve(subset, pos+1, best, partial_exclude)
if found_include < found_exclude:
return (found_exclude, subset_exclude)
else:
return (found_include, subset_include)
count, subset = solve([], 0, 0, [(len(base_trie.words), base_trie.count_words, base_trie)])
return ''.join([KEYS[x][0] for x in subset])
for i in range(20):
start = datetime.datetime.now()
print(best_solution(i))
print(datetime.datetime.now() - start)
这里是一个“笨蛋”sum-over-subsets,它会在 33 秒内从 1 到 26 的每次计数中累积不同字母的选择,从而在文件“enable.txt”中产生最多的单词在我的笔记本电脑上。 (32 秒是一个加速
由于 btilly 和 David Eisenstat 已经完成了优化搜索的艰巨工作,其中还包括重复项,我们知道此处最多 16 个字母的信息很有用。
from collections import defaultdict
def as_number(word):
word = word.lower();
n = 0
for c in word:
m = ord(c) - 97
if n & (1 << m):
return 0
else:
n |= 1 << m
return n
def get_letters(n):
letters = ""
i = 0
while n:
if n & 1:
letters += chr(97 + i)
n >>= 1
i += 1
return letters
def f(words, N):
hash = defaultdict(lambda: 0) #[0] * (1 << N)
for w in words:
num = as_number(w)
if num:
hash[num] += 1 #= -~hash[num]
dp = [hash.get(mask, 0) for mask in range(1 << N)]
for i in range(N):
for mask in range(1 << N):
if not (mask & (1 << i)):
dp[mask ^ (1 << i)] += dp[mask]
result = {}
for i in xrange(1, 1 << N):
k = bin(i).count("1")
if k in result:
if result[k]["best"] == dp[i]:
result[k]["best_letters"].append(get_letters(i))
elif result[k]["best"] < dp[i]:
result[k]["best"] = dp[i]
result[k]["best_letters"] = [get_letters(i)]
elif dp[i]:
result[k] = {
"best": dp[i],
"best_letters": [get_letters(i)]
}
return result
import os
import datetime
path = "enable.txt"
words = []
with open(path) as file:
for values in file:
words.append(values.strip())
start = datetime.datetime.now()
print f(words, 26)
print(datetime.datetime.now() - start)
输出:
// ♥ pypy py.py
{
2: {
'best': 2,
'best_letters': ['ab', 'de', 'ah', 'eh', 'al', 'am', 'em', 'an', 'en', 'do', 'ho', 'mo', 'no', 'er', 'is', 'os', 'at', 'it', 'mu', 'nu', 'ow', 'ay', 'oy']
},
3: {
'best': 9,
'best_letters': ['aet']
},
4: {
'best': 24,
'best_letters': ['aest']
},
5: {
'best': 66,
'best_letters': ['aelst']
},
6: {
'best': 150,
'best_letters': ['aeprst']
},
7: {
'best': 283,
'best_letters': ['aeoprst']
},
8: {
'best': 543,
'best_letters': ['aeilprst']
},
9: {
'best': 945,
'best_letters': ['aeilnprst']
},
10: {
'best': 1590,
'best_letters': ['aeilnoprst']
},
11: {
'best': 2557,
'best_letters': ['adeilnoprst']
},
12: {
'best': 3855,
'best_letters': ['acdeilnoprst']
},
13: {
'best': 5648,
'best_letters': ['acdeilnoprstu']
},
14: {
'best': 8001,
'best_letters': ['acdeilmnoprstu']
},
15: {
'best': 10701,
'best_letters': ['acdegilmnoprstu']
},
16: {
'best': 14060,
'best_letters': ['acdeghilmnoprstu']
},
17: {
'best': 17225,
'best_letters': ['abcdeghilmnoprstu']
},
18: {
'best': 20696,
'best_letters': ['abcdeghilmnoprstuy']
},
19: {
'best': 23723,
'best_letters': ['abcdeghiklmnoprstuy']
},
20: {
'best': 26542,
'best_letters': ['abcdefghiklmnoprstuy']
},
21: {
'best': 29501,
'best_letters': ['abcdefghiklmnoprstuwy']
},
22: {
'best': 31717,
'best_letters': ['abcdefghiklmnoprstuvwy']
},
23: {
'best': 32675,
'best_letters': ['abcdefghiklmnoprstuvwyz']
},
24: {
'best': 33548,
'best_letters': ['abcdefghiklmnoprstuvwxyz']
},
25: {
'best': 34299,
'best_letters': ['abcdefghijklmnoprstuvwxyz']
},
26: {
'best': 34816,
'best_letters': ['abcdefghijklmnopqrstuvwxyz']
}
}
0:00:32.295888
这是我之前的回答,添加了另一个借鉴自 David Eisenstat 的想法。这个想法是从一种数据结构开始的,这种数据结构可以消除由普通字母引起的差异。提速还不错。
这是 PyPy 下代码 运行 的完整日志。一旦你走得很远,它就会非常非常快,因为数据结构非常小。最慢的是 23 个字母。
0:00:00.000226
0:00:00.000095
ER
0:00:00.000514
EAT
0:00:00.014545
ESAT
0:00:00.069521
ESATL
0:00:00.063586
ESARTP
0:00:00.100273
ESARTOP
0:00:00.309665
ESIARTLP
0:00:01.171279
ESIARNTLP
0:00:02.435968
ESIARNTOLP
0:00:05.049897
ESIARNTOLDP
0:00:11.358067
ESIARNTOLCDP
0:00:23.378628
ESIARNTOLCDUP
0:00:38.672561
ESIARNTOLCDUPM
0:00:57.530691
ESIARNTOLCDUPMG
0:01:23.665592
ESIARNTOLCDUPMGH
0:01:43.571878
ESIARNTOLCDEUPMGH
0:02:02.831546
ESIARNTOLCDEUPMGSH
0:02:13.175045
ESIARNTOLCDEUPMGSHB
0:02:15.644423
ESIARNTOLCDEUPMGSHBY
0:02:25.681542
ESIARNTOLCDEUPMGSHBYK
0:02:38.858906
ESIARNTOLCDEUPMGSHBYAK
0:02:44.703865
ESIARNTOLCDEUPMGSHIBYAT
0:02:52.872105
ESIARNTOLCDEUPMGSHIBYATF
0:02:38.724424
ESIARNTOLCDEUPMGSHIBYATRF
0:02:19.892021
ESIARNTOLCDEUPMGSHIBYATRFV
0:02:01.069136
ESIARNTOLCDEUPMGSHIBYAOTRFV
0:01:39.992728
ESIARNTOLCDEUPMGSHIBYAONTRFV
0:01:09.800219
ESIARNTOLCDEUPMGSHIBYAONTRFVK
0:00:46.791619
ESIARNTOLCDEUPMGSHIBYAONTRFVKW
0:00:30.404072
ESIARNTOLCDEUPMGSHIBYAONTRFVLKW
0:00:17.932808
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWC
0:00:16.610114
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWEC
0:00:13.385111
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWECZ
0:00:11.731393
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZ
0:00:09.263752
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZD
0:00:08.459759
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZDP
0:00:06.778232
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZDPX
0:00:05.502753
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPX
0:00:05.504382
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXM
0:00:03.345941
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMU
0:00:02.786702
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUG
0:00:02.258980
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGQ
0:00:02.181486
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGQJ
0:00:02.069378
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGOHQ
0:00:01.678280
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGOHQJ
0:00:01.758940
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJ
0:00:01.015172
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJA
0:00:00.782816
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJAB
0:00:00.590803
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJABF
0:00:00.560564
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJABFT
0:00:00.789469
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFT
0:00:00.213274
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTR
0:00:00.142143
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTER
0:00:00.117293
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERL
0:00:00.062241
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLY
0:00:00.044667
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIY
0:00:00.032022
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYK
0:00:00.033368
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKC
0:00:00.028749
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCV
0:00:00.028905
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVW
0:00:00.038960
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVWZ
0:00:00.032474
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZ
0:00:00.031382
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZD
0:00:00.032725
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDG
0:00:00.028246
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGO
0:00:00.026071
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGON
0:00:00.023227
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONM
0:00:00.019831
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMP
0:00:00.015209
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPU
0:00:00.011805
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUE
0:00:00.013564
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEI
0:00:00.011815
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIA
0:00:00.005378
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIAB
0:00:00.013570
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABL
0:00:00.004920
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLH
0:00:00.002972
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHT
0:00:00.002923
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTF
0:00:00.003162
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSF
0:00:00.004121
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFR
0:00:00.004218
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJ
0:00:00.005604
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJC
0:00:00.006316
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCY
0:00:00.012737
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYN
0:00:00.010365
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNO
0:00:00.011111
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOI
0:00:00.013791
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIG
0:00:00.016119
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGD
0:00:00.017771
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDX
0:00:00.015684
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXW
0:00:00.014175
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZW
0:00:00.015253
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWU
0:00:00.009111
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEU
0:00:00.009990
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUK
0:00:00.013182
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKQ
0:00:00.003639
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKQV
0:00:00.002877
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQV
0:00:00.001968
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVM
0:00:00.001155
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMA
0:00:00.000890
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAP
0:00:00.000592
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPF
0:00:00.000894
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFB
0:00:00.000512
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTB
0:00:00.000452
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBE
0:00:00.000365
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEI
0:00:00.000277
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEIZ
0:00:00.000368
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEIZK
0:00:00.000321
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEIZKL
0:00:00.000275
这是代码
import re
import os
import datetime
path = "enable.txt"
words = []
with open(path) as f:
for values in f:
words.append(values.strip().upper())
key_count = {}
for word in words:
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
if key not in key_count:
key_count[key] = 1
else:
key_count[key] += 1
seen[letter] += 1
KEYS = sorted(key_count.keys(), key=lambda key: -key_count[key])
#print(KEYS)
#print(len(KEYS))
KEY_POS = {}
for i in range(len(KEYS)):
KEY_POS[KEYS[i]] = i
KEY_POS[KEYS[i]] = i
# Now we will build a trie. Every node has a list of words, and a dictionary
# from the next letter farther in the trie.
# BUT TRICK:, we will map each word to a sequence of numbers, and those numbers
# will be indexes into KEYS. This allows us to use the fact that a second 'e' is
# unlikely to store that efficiently.
class Trie:
def __init__(self, path):
self.words = []
self.dict = {}
self.words_len = 0
self.count_words = 0
self.path = path
def add_word (self, word):
trie = self
poses = []
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
poses.append(KEY_POS[(key)])
seen[letter] += 1
sorted_poses = sorted(poses);
for i in range(len(sorted_poses)):
trie.count_words += 1
pos = sorted_poses[i]
if pos not in trie.dict:
trie.dict[pos] = Trie(trie.path + KEYS[pos][0])
trie = trie.dict[pos]
trie.count_words += 1
trie.words_len += 1
trie.words.append(word)
def remove (self, pos):
trie = Trie(self.path)
trie.words = None
trie.dict = self.dict.copy()
trie.words_len = self.words_len
trie.count_words = self.count_words
trie.path = self.path
if pos in trie.dict:
trie.count_words -= trie.dict[pos].count_words
trie.dict.pop(pos)
return trie
def merge (self, other):
trie = Trie(self.path)
trie.words = None
trie.words_len = self.words_len + other.words_len
trie.count_words = self.count_words + other.count_words
trie.path = self.path
trie.dict = self.dict.copy()
for pos, subtrie in other.dict.items():
if pos in trie.dict:
trie.dict[pos] = trie.dict[pos].merge(subtrie)
else:
trie.dict[pos] = subtrie
return trie
def __repr__ (self):
answer = self.path + ":\n words_len: " + str(self.words_len) + "\n count_words:" + str(self.count_words) + " \n dict:\n"
def indent (string):
p = re.compile("^", re.M)
return p.sub(" ", string)
for pos in sorted(self.dict.keys()):
subtrie = self.dict[pos]
answer = answer + indent(str(pos) + ":\n" + subtrie.__repr__())
return answer
base_trie = Trie('')
for word in words:
base_trie.add_word(word)
base_tries = [base_trie]
last_trie = base_trie
for i in range(len(KEYS)):
subtrie = last_trie.dict[i]
last_trie = last_trie.remove(i).merge(subtrie)
base_tries.append(last_trie)
#print(base_tries[1].__repr__())
def best_solution (size):
def solve (subset, pos, best, partial):
found = sum(x[0] for x in partial)
upper_bound = sum(x[1] for x in partial)
if size <= len(subset) or upper_bound < best or len(KEYS) <= pos:
return (found, subset)
if best < found:
best = found
# Figure out our next calculations.
partial_include = []
partial_exclude = []
finalized_found = 0
for this_found, this_bound, this_trie in partial:
if this_trie is None:
# This is a generic record of already emptied tries
finalized_found += this_found
elif pos in this_trie.dict:
include_trie = this_trie.dict[pos]
partial_include.append((
this_found + include_trie.words_len,
include_trie.count_words + this_found,
include_trie
))
# We included the tally of found words in the previous partial.
# So do not double-count by including it again
partial_include.append((
0,
this_bound - include_trie.count_words - this_found,
this_trie
))
partial_exclude.append((
this_found,
this_bound - include_trie.count_words,
this_trie
))
elif this_found == this_bound:
finalized_found += this_found
else:
partial_include.append((
this_found,
this_bound,
this_trie
))
partial_exclude.append((
this_found,
this_bound,
this_trie
))
if 0 < finalized_found:
partial_include.append(
(finalized_found, finalized_found, None)
)
partial_exclude.append(
(finalized_found, finalized_found, None)
)
found_include, subset_include = solve(subset + [pos], pos+1, best, partial_include)
if best < found_include:
best = found_include
found_exclude, subset_exclude = solve(subset, pos+1, best, partial_exclude)
if found_include < found_exclude:
return (found_exclude, subset_exclude)
else:
return (found_include, subset_include)
best_count = 0
best_subset = []
start_subset = [i for i in range(size)]
while 0 < len(start_subset):
trie = base_tries[len(start_subset)].remove(start_subset[-1]+1)
count, subset = solve(list(start_subset), start_subset[-1]+2, best_count, [(trie.words_len, trie.count_words, trie)])
if best_count < count:
best_count = count
best_subset = subset
start_subset.pop()
return ''.join([KEYS[x][0] for x in best_subset])
for i in range(len(KEYS):
start = datetime.datetime.now()
print(best_solution(i))
print(datetime.datetime.now() - start)
这是一种方法,returns 结果会在
该方法使用位集并确定单词列表中字母频率的优先级(由 201 位字母位集散列以包括重复字母),通过仅包含没有新排除的单词哈希来划分相关单词列表字母.
这是一个提前退出递归的 hack,因为贪婪的分支看起来很倾斜。我欢迎利用这里的想法的方法(尽管如此,我还是更仔细地研究了 btilly 和 David Eisenstat 的回答)。
from collections import defaultdict
import time
def as_number(word):
word = word.lower();
n = 0
for c in word:
m = ord(c) - 97
if n & (1 << m):
return 0
else:
n |= 1 << m
return n
def as_number_with_duplicates(word):
counts = defaultdict(lambda: 0)
word = word.lower();
n = 0
for c in word:
m = ord(c) - 97
n |= 1 << (counts[c] * 26 + m)
counts[c] += 1
return n
def get_letters(n):
letters = ""
i = 0
while n:
if n & 1:
letters += chr(97 + i % 26)
n >>= 1
i += 1
return letters
def add_to_freq(freq, bits):
i = 0
while bits:
if bits & 1:
freq[i] += 1
bits >>= 1
i += 1
def arrange_freq(freq):
return sorted([(x, i) for i, x in enumerate(freq)])
def get_tuple(tpl, i, k, seen, num_to_count_map):
bits = tpl[2]["bits"]
freq = [0] * len(tpl[2]["freq"])
nums = tpl[2]["nums"]
bit = 1 << i
new_bits = bits ^ bit
if new_bits in seen:
return None
seen.add(new_bits)
new_nums = []
count = 0
for num in nums:
if not (num & bit):
new_nums.append(num)
count += num_to_count_map[num]
add_to_freq(freq, num)
return (count, tpl[1] - 1, {
"bits": new_bits,
"freq": arrange_freq(freq),
"nums": new_nums
})
def upper_index(a, val, left, right):
if left == right:
return left
mid = (left + right + 1) // 2
if a[mid][0] > val:
return upper_index(a, val, left, mid-1)
else:
return upper_index(a, val, mid, right)
# tpl = (word_count, letter_count, {bits, freq, nums})
def solve(tpl, k, seen, num_to_count_map, global_best):
if tpl[1] == k:
print "cand: %s, %s, %s" % (get_letters(tpl[2]["bits"]), tpl[1], tpl[0])
if tpl[0] > global_best["count"]:
global_best["count"] = tpl[0]
global_best["bits"] = tpl[2]["bits"]
if tpl[1] > k:
freq = tpl[2]["freq"]
start = 1 + upper_index(freq, 0, 0, len(freq))
for idx in xrange(start, len(freq)):
if time.time() > global_best["end"]:
return
_freq, i = freq[idx]
new_tpl = get_tuple(tpl, i, k, seen, num_to_count_map)
# Prune
if not new_tpl or new_tpl[0] < global_best["count"]:
continue
solve(new_tpl, k, seen, num_to_count_map, global_best)
def f(words, k):
num_to_count_map = defaultdict(lambda: 0)
nums = set()
all_bits = 0
freq = []
for w in words:
if len(w) > k:
continue
num = as_number_with_duplicates(w)
nums.add(num)
num_bits = len(bin(num)[2:])
if len(freq) < num_bits:
freq.extend([0] * (num_bits - len(freq)))
add_to_freq(freq, num)
num_to_count_map[num] += 1
all_bits |= num
tpl = (len(words), bin(all_bits).count("1"), {"bits": all_bits, "freq": arrange_freq(freq), "nums": nums})
global_best = {
"count": -float('inf'),
"bits": 0,
"end": time.time() + (5*60 if k == 22 else 20)
}
solve(tpl, k, set([all_bits]), num_to_count_map, global_best)
return global_best
def format(w):
return "".join(sorted(w.lower()))
def are_equal(w1, w2):
return format(w1) == format(w2)
"""
import os
import datetime
path = "enable.txt"
words = []
with open(path) as file:
for values in file:
words.append(values.strip())
start = datetime.datetime.now()
for i in xrange(2, 31):
result = f(words, i)
formatted = format(get_letters(result["bits"]))
print result["count"], formatted
print are_equal(formatted, btilly[i])
print(datetime.datetime.now() - start)
"""
# Code by btilly
import re
import os
import datetime
path = "enable.txt"
words = []
with open(path) as file:
for values in file:
words.append(values.strip().lower())
key_count = {}
for word in words:
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
if key not in key_count:
key_count[key] = 1
else:
key_count[key] += 1
seen[letter] += 1
KEYS = sorted(key_count.keys(), key=lambda key: -key_count[key])
#print(KEYS)
#print(len(KEYS))
KEY_POS = {}
for i in range(len(KEYS)):
KEY_POS[KEYS[i]] = i
KEY_POS[KEYS[i]] = i
# Now we will build a trie. Every node has a list of words, and a dictionary
# from the next letter farther in the trie.
# BUT TRICK:, we will map each word to a sequence of numbers, and those numbers
# will be indexes into KEYS. This allows us to use the fact that a second 'e' is
# unlikely to store that efficiently.
class Trie:
def __init__(self, path):
self.words = []
self.dict = {}
self.words_len = 0
self.count_words = 0
self.path = path
def add_word (self, word):
trie = self
poses = []
seen = {}
for letter in word:
if letter not in seen:
seen[letter] = 0
key = (letter, seen[letter])
poses.append(KEY_POS[(key)])
seen[letter] += 1
sorted_poses = sorted(poses);
for i in range(len(sorted_poses)):
trie.count_words += 1
pos = sorted_poses[i]
if pos not in trie.dict:
trie.dict[pos] = Trie(trie.path + KEYS[pos][0])
trie = trie.dict[pos]
trie.count_words += 1
trie.words_len += 1
trie.words.append(word)
def remove (self, pos):
trie = Trie(self.path)
trie.words = None
trie.dict = self.dict.copy()
trie.words_len = self.words_len
trie.count_words = self.count_words
trie.path = self.path
if pos in trie.dict:
trie.count_words -= trie.dict[pos].count_words
trie.dict.pop(pos)
return trie
def merge (self, other):
trie = Trie(self.path)
trie.words = None
trie.words_len = self.words_len + other.words_len
trie.count_words = self.count_words + other.count_words
trie.path = self.path
trie.dict = self.dict.copy()
for pos, subtrie in other.dict.items():
if pos in trie.dict:
trie.dict[pos] = trie.dict[pos].merge(subtrie)
else:
trie.dict[pos] = subtrie
return trie
def __repr__ (self):
answer = self.path + ":\n words_len: " + str(self.words_len) + "\n count_words:" + str(self.count_words) + " \n dict:\n"
def indent (string):
p = re.compile("^", re.M)
return p.sub(" ", string)
for pos in sorted(self.dict.keys()):
subtrie = self.dict[pos]
answer = answer + indent(str(pos) + ":\n" + subtrie.__repr__())
return answer
base_trie = Trie('')
for word in words:
base_trie.add_word(word)
base_tries = [base_trie]
last_trie = base_trie
for i in range(len(KEYS)):
subtrie = last_trie.dict[i]
last_trie = last_trie.remove(i).merge(subtrie)
base_tries.append(last_trie)
#print(base_tries[1].__repr__())
def best_solution (size):
def solve (subset, pos, best, partial):
found = sum(x[0] for x in partial)
upper_bound = sum(x[1] for x in partial)
if size <= len(subset) or upper_bound < best or len(KEYS) <= pos:
return (found, subset)
if best < found:
best = found
# Figure out our next calculations.
partial_include = []
partial_exclude = []
finalized_found = 0
for this_found, this_bound, this_trie in partial:
if this_trie is None:
# This is a generic record of already emptied tries
finalized_found += this_found
elif pos in this_trie.dict:
include_trie = this_trie.dict[pos]
partial_include.append((
this_found + include_trie.words_len,
include_trie.count_words + this_found,
include_trie
))
# We included the tally of found words in the previous partial.
# So do not double-count by including it again
partial_include.append((
0,
this_bound - include_trie.count_words - this_found,
this_trie
))
partial_exclude.append((
this_found,
this_bound - include_trie.count_words,
this_trie
))
elif this_found == this_bound:
finalized_found += this_found
else:
partial_include.append((
this_found,
this_bound,
this_trie
))
partial_exclude.append((
this_found,
this_bound,
this_trie
))
if 0 < finalized_found:
partial_include.append(
(finalized_found, finalized_found, None)
)
partial_exclude.append(
(finalized_found, finalized_found, None)
)
found_include, subset_include = solve(subset + [pos], pos+1, best, partial_include)
if best < found_include:
best = found_include
found_exclude, subset_exclude = solve(subset, pos+1, best, partial_exclude)
if found_include < found_exclude:
return (found_exclude, subset_exclude)
else:
return (found_include, subset_include)
best_count = 0
best_subset = []
start_subset = [i for i in range(size)]
while 0 < len(start_subset):
trie = base_tries[len(start_subset)].remove(start_subset[-1]+1)
count, subset = solve(list(start_subset), start_subset[-1]+2, best_count, [(trie.words_len, trie.count_words, trie)])
if best_count < count:
best_count = count
best_subset = subset
start_subset.pop()
return ''.join([KEYS[x][0] for x in best_subset])
# End code by btilly
for i in range(25, len(KEYS)):
start = datetime.datetime.now()
btilly = best_solution(i)
print btilly
print(datetime.datetime.now() - start)
start = datetime.datetime.now()
result = f(words, i)
formatted = format(get_letters(result["bits"]))
print formatted
print(datetime.datetime.now() - start)
print are_equal(btilly, formatted)
print ""