算法:可以使用哪组长度为 N 的图块来生成最多数量的拼字游戏有效单词?

Algorithm: What set of tiles of length N can be used to generate the most amount of Scrabble-valid words?

我正在尝试创建一个函数 best_tiles,它接收您手中的牌数和 return 允许您生成最多独特英语的牌组- 有效的单词,假设你只能使用每个方块一次。

例如,用你手上的那组牌(A, B, C)你可以拼出CAB、BAC、AB、BA(都是英文单词),所以你可以拼出4个独特的与那套话。使用 (B, A, A),您可以拼出 5 个单词:ABA、BAA、AA、AB 和 BA。目标是找到一组字母,使您可以拼出最多的英语有效单词(无需替换)。

因此,如果 5 是 N = 3 的任意字母组合可以拼出的最大单词数,则 运行 best_tiles( n = 3 ) 将 return B, A, A

我想知道如何有效地实现它?我目前的方法根本无法很好地适应字母数量。

我读了一个单词表。在这种情况下,我在这里使用 enable.txt:https://www.wordgamedictionary.com/enable/

import os
path = "enable.txt"
words = []
with open(path , encoding='utf8') as f: 
    for values in f:
        words.append(list(values.strip().upper()))

我创建了一个函数 word_in_tiles h/t smack89,它 return 是否有可能在给定的 tile set 中构造一个单词:

def word_in_tiles(word, tiles):
  tiles_counter = collections.Counter(tiles)
  return all(tiles_counter.get(ch, 0) == cnt for ch,cnt in 
  collections.Counter(word).items())

然后我创建了一个函数 get_all_words,它生成一个列表,其中包含一个人可以从单词列表和拼贴集中拼写的所有可能单词。

def get_all_words(tile_set, words):
    # words is a word list
    return [i for i in words if word_in_tiles(i, tile_set)]

识别哪个 tileset 是三个字母的“最佳”的极其天真的方法如下:

我首先为给定的长度创建一个包含所有可能组合的列表。所以对于长度 3,我会这样做:

import string
import itertools 

letters = string.ascii_lowercase
all_three_letter_combinations = list(itertools.combinations_with_replacement(letters, 3))

# Create a list of only words that are three letters are less
three_letter = [i for i in words if len(i) <= 3]

sequence_dict = dict()
    for i in all_three_letter_combinations:
        string_test = "".join(i).upper()
        sequence_dict[i] = get_all_words(string_test, three_letter)

然后去掉没有长度的值,按照结果的长度排序:

res = {k: v for k, v in sequence_dict.items() if len(v) >= 1}

def GetMaxLength(res):
    return max((len(v), v, k) for k, v in res.items())[1:]

获取最大长度(分辨率) 你明白了,对于三个字母,产生最多英语有效单词的 tile-set 是 T A E,它可以产生以下单词 ['AE', 'AT', 'ATE', 'EAT', 'ET', 'ETA', 'TA', 'TAE', 'TEA']

我希望能够将其扩展到 N = 15。执行此操作的最佳程序是什么?

如果您使用替换进行选择,您将始终获得相同数量的组合,但使用独特字符的独特组合更多。

  • AA: AA,AA,A,A
  • AB: AB,BA,A,B

因为这会变得非常大,非常快,而不是静态地创建所有组合,您可以创建一个生成器来生成所有值 as-needed(作为其他计算的输入) .

import string
import itertools

def gen_substrs_with_replacement(maxlen, max_chars=26):
    letters = string.ascii_lowercase[:max_chars]
    for length in range(1, maxlen + 1):  # start at 1 and end at N
        yield from itertools.combinations_with_replacement(letters, length)

但是,如果您只想要 count 个值,您可以使用带替换的二项式公式计算它们。

请注意,如果您也想要子字符串,您实际上需要每次选择较少的系列的总和(ABCDAAA AA A).

from math import comb as nchoosek

# N nhoose K with replacement and every sub-list
def nCk_r(source_length, selections, sublist=False):
    if sublist and selections > 1:
        return nchoosek(source_length + selections - 1, selections) + nCk_r(source_length, selections - 1, True)
    return nchoosek(source_length + selections -1, selections)
>>> list(gen_substrs_with_replacement(2, 3))
[('a',), ('b',), ('c',), ('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'b'), ('b', 'c'), ('c', 'c')]
>>> len(list(gen_substrs_with_replacement(2, 3)))
9
>>> nCk_r(2,3,True)
9
>>> len(list(gen_substrs_with_replacement(12, 10)))
646645
>>> nCk_r(12,10,True)
646645

目标应该是使您的算法达到 运行 O(enablelist_size) 的顺序。那就是我们不想浪费时间生成不在列表中的单词,这样你就可以确保你的算法是高效的。


  1. 您可能应该做的第一件事是按字数划分列表。

    因此,将您的 enable.txt 文件分成包含所有 2 个字母的单词、3 个字母的单词等的文件。也许将它们命名为 enable_2.txtenable_3.txt 等.

    length_map = collections.defaultdict(list)
    with open("enable.txt", "r") as f:
      for word in f:
        length_map[len(word)].append(word)
    
    for n in length_map:
      with open(f"enable_{n}.txt", "w+") as f:
          f.write("\n".join(length_map[c]))
    
    

    你不必这样做,但它确实有助于加快接下来的任务

  2. 现在,如果您有 n 个图块,您想要阅读包含 2n 个字母词的所有 enable_n.txt 个文件。

    如果您没有完成第一部分,那么只需阅读您拥有的 enable.txt 文件.

    对于文件中的每个单词,您需要检查您的图块是否包含该单词中的所有字母。

    这是一个 python 函数,您可以用它来检查:

    import collections
    
    def word_in_tiles(word, tiles):
      tiles_counter = collections.Counter(tiles)
      return all(tiles_counter.get(ch, 0) == cnt for ch,cnt in collections.Counter(word).items())
    

    如果函数 returns True 任何单词,你可以打印那个单词。

这种方法比生成所有可能的图块组合要快得多。

在最坏的情况下,此方法的速度就好像您以某种方式生成了已经在 enable.txt 中的所有有效单词,然后过滤掉那些长度不正确的单词一样快。

这段代码可以在我的笔记本电脑上使用 PyPy 在几分钟内优化 n=15,发现

10701 acdegilmnoprstu.

想法是进行分支定界,在每个节点处强制包含一些字母,而排除其他字母。我们通过找到一个 order-preserving 映射 f(保留多重集包含的偏序)从字母的多重集到更小的偏序 space,得出每个节点质量的上限,然后计算我们可以获得的单词数量,其中 f(word) 包含在最佳 f(tiles) 中。在较小的 space 上,我们可以使用快速卷积方法(让人联想到 FFT)来暴力解决问题。

为了找到一个好的space,我们贪婪地一次一个地删除字母以影响尽可能少的单词,直到上限可以被暴力破解。

import array
import collections
import functools
import heapq


def count_occurrences_of_letters(raw_word):
    occurs = collections.Counter()
    word = []
    for letter in raw_word:
        word.append(letter + str(occurs[letter]))
        occurs[letter] += 1
    return word


def greedy_censorship_order(words):
    hits = collections.defaultdict(set)
    for index, word in enumerate(words):
        for letter in word:
            hits[letter].add(index)
    order = []
    while hits:
        censored_letter = min(hits.keys(), key=lambda letter: len(hits[letter]))
        order.append(censored_letter)
        for index in hits[censored_letter]:
            for letter in words[index]:
                if letter != censored_letter:
                    hits[letter].remove(index)
        del hits[censored_letter]
    return order


def bitmap_from_word(word, alphabet):
    bitmap = 0
    censored = 0
    for letter in word:
        try:
            bitmap |= 1 << alphabet.index(letter)
        except ValueError:
            censored += 1
    return bitmap, censored


def sum_over_subsets(vector, dimension):
    for i in range(dimension):
        bit = 1 << i
        for bitmap in range(1 << dimension):
            if not (bitmap & bit):
                vector[bitmap | bit] += vector[bitmap]


def count_set_bits(n):
    return bin(n).count("1")


@functools.total_ordering
class Node:
    def __init__(self, subset, n, unfiltered_words):
        self.subset = subset
        self.n = n
        self.words = [word for word in unfiltered_words if len(word) <= n]
        self.upper_bound = sum(not word for word in self.words)
        if n == 0:
            return
        order = greedy_censorship_order(self.words)
        if not order:
            self.pivot = None
            return
        self.pivot = order[-1]
        alphabet = order[-(n + 7) :]
        zeros = [0] * (1 << len(alphabet))
        vectors = [array.array("l", zeros) for i in range(n + 1)]
        for word in self.words:
            bitmap, censored = bitmap_from_word(word, alphabet)
            for i in range(censored, n + 1):
                vectors[i][bitmap] += 1
        for vector in vectors:
            sum_over_subsets(vector, len(alphabet))
        self.upper_bound = max(
            vectors[n - count_set_bits(bitmap)][bitmap]
            for bitmap in range(1 << len(alphabet))
            if count_set_bits(bitmap) <= n
        )

    def children(self):
        if self.pivot is None:
            return
        yield Node(
            self.subset, self.n, [word for word in self.words if self.pivot not in word]
        )
        yield Node(
            self.subset | {self.pivot},
            self.n - 1,
            [
                [letter for letter in word if letter != self.pivot]
                for word in self.words
            ],
        )

    def __eq__(self, other):
        return self.upper_bound == other.upper_bound

    def __ne__(self, other):
        return self.upper_bound != other.upper_bound

    def __lt__(self, other):
        return self.upper_bound > other.upper_bound


def solve(n, words):
    heap = [Node(set(), n, words)]
    while True:
        top = heapq.heappop(heap)
        print(top.upper_bound, "".join(sorted(letter[0] for letter in top.subset)))
        if top.n == 0:
            return
        for child in top.children():
            heapq.heappush(heap, child)


def main():
    with open("enable.txt") as file:
        raw_words = file.read().split()
    words = [count_occurrences_of_letters(word) for word in raw_words]
    solve(15, words)


if __name__ == "__main__":
    main()

我觉得这样就够了!

这是我在 PyPy 下的代码 运行 的日志:

0:00:00.000232
E
0:00:00.001251
ER
0:00:00.048733
EAT
0:00:00.208744
ESAT
0:00:00.087425
ESATL
0:00:00.132049
ESARTP
0:00:00.380296
ESARTOP
0:00:01.409129
ESIARTLP
0:00:03.433526
ESIARNTLP
0:00:10.391252
ESIARNTOLP
0:00:25.651012
ESIARNTOLDP
0:00:56.642405
ESIARNTOLCDP
0:01:57.257293
ESIARNTOLCDUP
0:03:55.933906
ESIARNTOLCDUPM
0:07:17.146036
ESIARNTOLCDUPMG
0:10:14.844347
ESIARNTOLCDUPMGH
0:13:34.722600
ESIARNTOLCDEUPMGH
0:18:14.215019
ESIARNTOLCDEUPMGSH
0:22:47.129284
ESIARNTOLCDEUPMGSHB
0:27:56.859511
ESIARNTOLCDEUPMGSHBYK
0:46:20.448502
ESIARNTOLCDEUPMGSHBYAK
0:57:15.213635
ESIARNTOLCDEUPMGSHIBYAT
1:09:55.530180
ESIARNTOLCDEUPMGSHIBYATF
1:18:35.209599
ESIARNTOLCDEUPMGSHIBYATRF
1:21:54.095119
ESIARNTOLCDEUPMGSHIBYATRFV
1:20:16.978411
ESIARNTOLCDEUPMGSHIBYAOTRFV
1:14:24.253660
ESIARNTOLCDEUPMGSHIBYAONTRFV
1:00:37.405571

关键的改进是这些。

  1. 我不仅区分字母,还区分字母被看过多少次。因此,我可以接受或继续前进的每封信。这是我在评论 David Eisenstat 的解决方案时产生的想法。
  2. 我还从他那里了解到,修剪掉无法得出答案的树可以很好地控制问题的发展。
  3. 我看到的第一个解决方案就是所有最上面的字母。这是一个很好的解决方案,所以尽管它是深度优先的,但我们会很好地修剪。
  4. 我很小心地将“用尽的尝试”合并为一条记录。这减少了我们必须丢弃的数据量。

这是代码。

import os
import datetime
path = "enable.txt"
words = []
with open(path) as f:
    for values in f:
        words.append(values.strip().upper())

key_count = {}
for word in words:
    seen = {}
    for letter in word:
        if letter not in seen:
            seen[letter] = 0
        key = (letter, seen[letter])
        if key not in key_count:
            key_count[key] = 1
        else:
            key_count[key] += 1
        seen[letter] += 1


KEYS = sorted(key_count.keys(), key=lambda key: -key_count[key])
#print(KEYS)
#print(len(KEYS))
KEY_POS = {}
for i in range(len(KEYS)):
    KEY_POS[KEYS[i]] = i

# Now we will build a trie.  Every node has a list of words, and a dictionary
# from the next letter farther in the trie.
# BUT TRICK:, we will map each word to a sequence of numbers, and those numbers
# will be indexes into KEYS.  This allows us to use the fact that a second 'e' is
# unlikely, so we can deal with that efficiently.
class Trie:
    def __init__(self, path):
        self.words = []
        self.dict = {}
        self.min_pos = -1
        self.max_pos = -1
        self.words = []
        self.count_words = 0
        self.path = path

    def add_word (self, word):
        trie = self

        poses = []
        seen = {}
        for letter in word:
            if letter not in seen:
                seen[letter] = 0
            key = (letter, seen[letter])
            poses.append(KEY_POS[(key)])
            seen[letter] += 1
        sorted_poses = sorted(poses);
        for i in range(len(sorted_poses)):
            trie.count_words += 1
            pos = sorted_poses[i]
            if pos not in trie.dict:
                trie.dict[pos] = Trie(trie.path + KEYS[pos][0])
                if trie.max_pos < pos:
                    trie.max_pos = pos
            trie = trie.dict[pos]
        trie.count_words += 1
        trie.words.append(word)


base_trie = Trie('')
for word in words:
    base_trie.add_word(word);

def best_solution (size):
    def solve (subset, pos, best, partial):
        found = sum(x[0] for x in partial)
        upper_bound = sum(x[1] for x in partial)
        if size <= len(subset) or upper_bound < best or len(KEYS) <= pos:
            return (found, subset)
        if best < found:
            best = found
        # Figure out our next calculations.
        partial_include = []
        partial_exclude = []
        finalized_found = 0
        for this_found, this_bound, this_trie in partial:
            if this_trie is None:
                # This is a generic record of already emptied tries
                finalized_found += this_found
            elif pos in this_trie.dict:
                include_trie = this_trie.dict[pos]
                partial_include.append((
                    this_found + len(include_trie.words),
                    include_trie.count_words + this_found,
                    include_trie
                ))
                # We included the tally of found words in the previous partial.
                # So do not double-count by including it again
                partial_include.append((
                    0,
                    this_bound - include_trie.count_words - this_found,
                    this_trie
                ))
                partial_exclude.append((
                    this_found,
                    this_bound - include_trie.count_words,
                    this_trie
                ))
            elif this_found == this_bound:
                finalized_found += this_found
            else:
                partial_include.append((
                    this_found,
                    this_bound,
                    this_trie
                ))

                partial_exclude.append((
                    this_found,
                    this_bound,
                    this_trie
                ))
        if 0 < finalized_found:
            partial_include.append(
                (finalized_found, finalized_found, None)
            )
            partial_exclude.append(
                (finalized_found, finalized_found, None)
            )

        found_include, subset_include = solve(subset + [pos], pos+1, best, partial_include)
        if best < found_include:
            best = found_include
        found_exclude, subset_exclude = solve(subset, pos+1, best, partial_exclude)
        if found_include < found_exclude:
            return (found_exclude, subset_exclude)
        else:
            return (found_include, subset_include)


    count, subset = solve([], 0, 0, [(len(base_trie.words), base_trie.count_words, base_trie)])
    return ''.join([KEYS[x][0] for x in subset])

for i in range(20):
    start = datetime.datetime.now()
    print(best_solution(i))
    print(datetime.datetime.now() - start)

这里是一个“笨蛋”sum-over-subsets,它会在 33 秒内从 1 到 26 的每次计数中累积不同字母的选择,从而在文件“enable.txt”中产生最多的单词在我的笔记本电脑上。 (32 秒是一个加速 ,将我原来的代码 运行 在 6 分 45 秒内更改为 in-place 方法)。

由于 btilly 和 David Eisenstat 已经完成了优化搜索的艰巨工作,其中还包括重复项,我们知道此处最多 16 个字母的信息很有用。

from collections import defaultdict

def as_number(word):
    word = word.lower();
    n = 0
    for c in word:
        m = ord(c) - 97
        if n & (1 << m):
            return 0
        else:
            n |= 1 << m
    return n

def get_letters(n):
    letters = ""
    i = 0
    while n:
        if n & 1:
            letters += chr(97 + i)
        n >>= 1
        i += 1
    return letters

def f(words, N):
    hash = defaultdict(lambda: 0) #[0] * (1 << N)

    for w in words:
        num = as_number(w)
        if num:
            hash[num] += 1 #= -~hash[num]
  
    dp = [hash.get(mask, 0) for mask in range(1 << N)]

    for i in range(N):
        for mask in range(1 << N):
            if not (mask & (1 << i)):
                dp[mask ^ (1 << i)] += dp[mask]

    result = {}

    for i in xrange(1, 1 << N):
        k = bin(i).count("1")
        if k in result:
            if result[k]["best"] == dp[i]:
                result[k]["best_letters"].append(get_letters(i))
            elif result[k]["best"] < dp[i]:
                result[k]["best"] = dp[i]
                result[k]["best_letters"] = [get_letters(i)]
        elif dp[i]:
            result[k] = {
                "best": dp[i],
                "best_letters": [get_letters(i)]
            }

    return result

import os
import datetime
path = "enable.txt"
words = []
with open(path) as file:
    for values in file:
        words.append(values.strip())

start = datetime.datetime.now()
print f(words, 26)
print(datetime.datetime.now() - start)

输出:

// ♥ pypy py.py
{
    2: {
        'best': 2,
        'best_letters': ['ab', 'de', 'ah', 'eh', 'al', 'am', 'em', 'an', 'en', 'do', 'ho', 'mo', 'no', 'er', 'is', 'os', 'at', 'it', 'mu', 'nu', 'ow', 'ay', 'oy']
    },
    3: {
        'best': 9,
        'best_letters': ['aet']
    },
    4: {
        'best': 24,
        'best_letters': ['aest']
    },
    5: {
        'best': 66,
        'best_letters': ['aelst']
    },
    6: {
        'best': 150,
        'best_letters': ['aeprst']
    },
    7: {
        'best': 283,
        'best_letters': ['aeoprst']
    },
    8: {
        'best': 543,
        'best_letters': ['aeilprst']
    },
    9: {
        'best': 945,
        'best_letters': ['aeilnprst']
    },
    10: {
        'best': 1590,
        'best_letters': ['aeilnoprst']
    },
    11: {
        'best': 2557,
        'best_letters': ['adeilnoprst']
    },
    12: {
        'best': 3855,
        'best_letters': ['acdeilnoprst']
    },
    13: {
        'best': 5648,
        'best_letters': ['acdeilnoprstu']
    },
    14: {
        'best': 8001,
        'best_letters': ['acdeilmnoprstu']
    },
    15: {
        'best': 10701,
        'best_letters': ['acdegilmnoprstu']
    },
    16: {
        'best': 14060,
        'best_letters': ['acdeghilmnoprstu']
    },
    17: {
        'best': 17225,
        'best_letters': ['abcdeghilmnoprstu']
    },
    18: {
        'best': 20696,
        'best_letters': ['abcdeghilmnoprstuy']
    },
    19: {
        'best': 23723,
        'best_letters': ['abcdeghiklmnoprstuy']
    },
    20: {
        'best': 26542,
        'best_letters': ['abcdefghiklmnoprstuy']
    },
    21: {
        'best': 29501,
        'best_letters': ['abcdefghiklmnoprstuwy']
    },
    22: {
        'best': 31717,
        'best_letters': ['abcdefghiklmnoprstuvwy']
    },
    23: {
        'best': 32675,
        'best_letters': ['abcdefghiklmnoprstuvwyz']
    },
    24: {
        'best': 33548,
        'best_letters': ['abcdefghiklmnoprstuvwxyz']
    },
    25: {
        'best': 34299,
        'best_letters': ['abcdefghijklmnoprstuvwxyz']
    },
    26: {
        'best': 34816,
        'best_letters': ['abcdefghijklmnopqrstuvwxyz']
    }
}
0:00:32.295888

这是我之前的回答,添加了另一个借鉴自 David Eisenstat 的想法。这个想法是从一种数据结构开始的,这种数据结构可以消除由普通字母引起的差异。提速还不错。

这是 PyPy 下代码 运行 的完整日志。一旦你走得很远,它就会非常非常快,因为数据结构非常小。最慢的是 23 个字母。

0:00:00.000226

0:00:00.000095
ER
0:00:00.000514
EAT
0:00:00.014545
ESAT
0:00:00.069521
ESATL
0:00:00.063586
ESARTP
0:00:00.100273
ESARTOP
0:00:00.309665
ESIARTLP
0:00:01.171279
ESIARNTLP
0:00:02.435968
ESIARNTOLP
0:00:05.049897
ESIARNTOLDP
0:00:11.358067
ESIARNTOLCDP
0:00:23.378628
ESIARNTOLCDUP
0:00:38.672561
ESIARNTOLCDUPM
0:00:57.530691
ESIARNTOLCDUPMG
0:01:23.665592
ESIARNTOLCDUPMGH
0:01:43.571878
ESIARNTOLCDEUPMGH
0:02:02.831546
ESIARNTOLCDEUPMGSH
0:02:13.175045
ESIARNTOLCDEUPMGSHB
0:02:15.644423
ESIARNTOLCDEUPMGSHBY
0:02:25.681542
ESIARNTOLCDEUPMGSHBYK
0:02:38.858906
ESIARNTOLCDEUPMGSHBYAK
0:02:44.703865
ESIARNTOLCDEUPMGSHIBYAT
0:02:52.872105
ESIARNTOLCDEUPMGSHIBYATF
0:02:38.724424
ESIARNTOLCDEUPMGSHIBYATRF
0:02:19.892021
ESIARNTOLCDEUPMGSHIBYATRFV
0:02:01.069136
ESIARNTOLCDEUPMGSHIBYAOTRFV
0:01:39.992728
ESIARNTOLCDEUPMGSHIBYAONTRFV
0:01:09.800219
ESIARNTOLCDEUPMGSHIBYAONTRFVK
0:00:46.791619
ESIARNTOLCDEUPMGSHIBYAONTRFVKW
0:00:30.404072
ESIARNTOLCDEUPMGSHIBYAONTRFVLKW
0:00:17.932808
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWC
0:00:16.610114
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWEC
0:00:13.385111
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWECZ
0:00:11.731393
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZ
0:00:09.263752
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZD
0:00:08.459759
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZDP
0:00:06.778232
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZDPX
0:00:05.502753
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPX
0:00:05.504382
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXM
0:00:03.345941
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMU
0:00:02.786702
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUG
0:00:02.258980
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGQ
0:00:02.181486
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGQJ
0:00:02.069378
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGOHQ
0:00:01.678280
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGOHQJ
0:00:01.758940
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJ
0:00:01.015172
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJA
0:00:00.782816
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJAB
0:00:00.590803
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJABF
0:00:00.560564
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJABFT
0:00:00.789469
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFT
0:00:00.213274
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTR
0:00:00.142143
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTER
0:00:00.117293
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERL
0:00:00.062241
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLY
0:00:00.044667
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIY
0:00:00.032022
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYK
0:00:00.033368
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKC
0:00:00.028749
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCV
0:00:00.028905
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVW
0:00:00.038960
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVWZ
0:00:00.032474
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZ
0:00:00.031382
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZD
0:00:00.032725
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDG
0:00:00.028246
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGO
0:00:00.026071
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGON
0:00:00.023227
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONM
0:00:00.019831
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMP
0:00:00.015209
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPU
0:00:00.011805
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUE
0:00:00.013564
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEI
0:00:00.011815
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIA
0:00:00.005378
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIAB
0:00:00.013570
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABL
0:00:00.004920
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLH
0:00:00.002972
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHT
0:00:00.002923
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTF
0:00:00.003162
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSF
0:00:00.004121
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFR
0:00:00.004218
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJ
0:00:00.005604
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJC
0:00:00.006316
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCY
0:00:00.012737
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYN
0:00:00.010365
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNO
0:00:00.011111
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOI
0:00:00.013791
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIG
0:00:00.016119
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGD
0:00:00.017771
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDX
0:00:00.015684
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXW
0:00:00.014175
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZW
0:00:00.015253
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWU
0:00:00.009111
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEU
0:00:00.009990
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUK
0:00:00.013182
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKQ
0:00:00.003639
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKQV
0:00:00.002877
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQV
0:00:00.001968
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVM
0:00:00.001155
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMA
0:00:00.000890
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAP
0:00:00.000592
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPF
0:00:00.000894
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFB
0:00:00.000512
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTB
0:00:00.000452
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBE
0:00:00.000365
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEI
0:00:00.000277
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEIZ
0:00:00.000368
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEIZK
0:00:00.000321
ESIARNTOLCDEUPMGSHIBYAONTRFVLKWESCZIDPXMUGONHQJASBFTERLIYKCVSWZDGONMPUEIABLHTSFRJCYNOIGDXZWEUKSQVMAPFTBEIZKL
0:00:00.000275

这是代码

import re
import os
import datetime
path = "enable.txt"
words = []
with open(path) as f:
    for values in f:
        words.append(values.strip().upper())

key_count = {}
for word in words:
    seen = {}
    for letter in word:
        if letter not in seen:
            seen[letter] = 0
        key = (letter, seen[letter])
        if key not in key_count:
            key_count[key] = 1
        else:
            key_count[key] += 1
        seen[letter] += 1


KEYS = sorted(key_count.keys(), key=lambda key: -key_count[key])
#print(KEYS)
#print(len(KEYS))
KEY_POS = {}
for i in range(len(KEYS)):
    KEY_POS[KEYS[i]] = i
    KEY_POS[KEYS[i]] = i

# Now we will build a trie.  Every node has a list of words, and a dictionary
# from the next letter farther in the trie.
# BUT TRICK:, we will map each word to a sequence of numbers, and those numbers
# will be indexes into KEYS.  This allows us to use the fact that a second 'e' is
# unlikely to store that efficiently.
class Trie:
    def __init__(self, path):
        self.words = []
        self.dict = {}
        self.words_len = 0
        self.count_words = 0
        self.path = path

    def add_word (self, word):
        trie = self

        poses = []
        seen = {}
        for letter in word:
            if letter not in seen:
                seen[letter] = 0
            key = (letter, seen[letter])
            poses.append(KEY_POS[(key)])
            seen[letter] += 1
        sorted_poses = sorted(poses);
        for i in range(len(sorted_poses)):
            trie.count_words += 1
            pos = sorted_poses[i]
            if pos not in trie.dict:
                trie.dict[pos] = Trie(trie.path + KEYS[pos][0])
            trie = trie.dict[pos]
        trie.count_words += 1
        trie.words_len += 1
        trie.words.append(word)

    def remove (self, pos):
        trie = Trie(self.path)
        trie.words = None
        trie.dict = self.dict.copy()
        trie.words_len = self.words_len
        trie.count_words = self.count_words
        trie.path = self.path
        if pos in trie.dict:
            trie.count_words -= trie.dict[pos].count_words
            trie.dict.pop(pos)
        return trie

    def merge (self, other):
        trie = Trie(self.path)
        trie.words = None
        trie.words_len = self.words_len + other.words_len
        trie.count_words = self.count_words + other.count_words
        trie.path = self.path
        trie.dict = self.dict.copy()
        for pos, subtrie in other.dict.items():
            if pos in trie.dict:
                trie.dict[pos] = trie.dict[pos].merge(subtrie)
            else:
                trie.dict[pos] = subtrie
        return trie

    def __repr__ (self):
        answer = self.path + ":\n  words_len: " + str(self.words_len) + "\n  count_words:" + str(self.count_words) + " \n  dict:\n"
        def indent (string):
            p = re.compile("^", re.M)
            return p.sub("    ", string)
        for pos in sorted(self.dict.keys()):
            subtrie = self.dict[pos]
            answer = answer + indent(str(pos) + ":\n" + subtrie.__repr__())
        return answer

base_trie = Trie('')
for word in words:
    base_trie.add_word(word)

base_tries = [base_trie]
last_trie = base_trie
for i in range(len(KEYS)):
    subtrie = last_trie.dict[i]
    last_trie = last_trie.remove(i).merge(subtrie)
    base_tries.append(last_trie)

#print(base_tries[1].__repr__())

def best_solution (size):
    def solve (subset, pos, best, partial):
        found = sum(x[0] for x in partial)
        upper_bound = sum(x[1] for x in partial)
        if size <= len(subset) or upper_bound < best or len(KEYS) <= pos:
            return (found, subset)
        if best < found:
            best = found

        # Figure out our next calculations.
        partial_include = []
        partial_exclude = []
        finalized_found = 0
        for this_found, this_bound, this_trie in partial:
            if this_trie is None:
                # This is a generic record of already emptied tries
                finalized_found += this_found
            elif pos in this_trie.dict:
                include_trie = this_trie.dict[pos]
                partial_include.append((
                    this_found + include_trie.words_len,
                    include_trie.count_words + this_found,
                    include_trie
                ))
                # We included the tally of found words in the previous partial.
                # So do not double-count by including it again
                partial_include.append((
                    0,
                    this_bound - include_trie.count_words - this_found,
                    this_trie
                ))
                partial_exclude.append((
                    this_found,
                    this_bound - include_trie.count_words,
                    this_trie
                ))
            elif this_found == this_bound:
                finalized_found += this_found
            else:
                partial_include.append((
                    this_found,
                    this_bound,
                    this_trie
                ))

                partial_exclude.append((
                    this_found,
                    this_bound,
                    this_trie
                ))
        if 0 < finalized_found:
            partial_include.append(
                (finalized_found, finalized_found, None)
            )

            partial_exclude.append(
                (finalized_found, finalized_found, None)
            )

        found_include, subset_include = solve(subset + [pos], pos+1, best, partial_include)
        if best < found_include:
            best = found_include
        found_exclude, subset_exclude = solve(subset, pos+1, best, partial_exclude)
        if found_include < found_exclude:
            return (found_exclude, subset_exclude)
        else:
            return (found_include, subset_include)


    best_count = 0
    best_subset = []
    start_subset = [i for i in range(size)]
    while 0 < len(start_subset):
        trie = base_tries[len(start_subset)].remove(start_subset[-1]+1)
        count, subset = solve(list(start_subset), start_subset[-1]+2, best_count, [(trie.words_len, trie.count_words, trie)])
        if best_count < count:
            best_count = count
            best_subset = subset
        start_subset.pop()
    return ''.join([KEYS[x][0] for x in best_subset])

for i in range(len(KEYS):
    start = datetime.datetime.now()
    print(best_solution(i))
    print(datetime.datetime.now() - start)

这是一种方法,returns 结果会在 中列出,但将每个结果的执行时间限制为 20 秒(长度 22 除外,并且可能有几个其他异常值):) ^^

该方法使用位集并确定单词列表中字母频率的优先级(由 201 位字母位集散列以包括重复字母),通过仅包含没有新排除的单词哈希来划分相关单词列表字母.

这是一个提前退出递归的 hack,因为贪婪的分支看起来很倾斜。我欢迎利用这里的想法的方法(尽管如此,我还是更仔细地研究了 btilly 和 David Eisenstat 的回答)。

from collections import defaultdict
import time

def as_number(word):
    word = word.lower();
    n = 0
    for c in word:
        m = ord(c) - 97
        if n & (1 << m):
            return 0
        else:
            n |= 1 << m
    return n

def as_number_with_duplicates(word):
    counts = defaultdict(lambda: 0)
    word = word.lower();
    n = 0
    for c in word:
        m = ord(c) - 97
        n |= 1 << (counts[c] * 26 + m)
        counts[c] += 1
    return n

def get_letters(n):
    letters = ""
    i = 0
    while n:
        if n & 1:
            letters += chr(97 + i % 26)
        n >>= 1
        i += 1
    return letters

def add_to_freq(freq, bits):
    i = 0
    while bits:
        if bits & 1:
            freq[i] += 1
        bits >>= 1
        i += 1

def arrange_freq(freq):
    return sorted([(x, i) for i, x in enumerate(freq)])

def get_tuple(tpl, i, k, seen, num_to_count_map):
    bits = tpl[2]["bits"]
    freq = [0] * len(tpl[2]["freq"])
    nums = tpl[2]["nums"]

    bit = 1 << i
    new_bits = bits ^ bit

    if new_bits in seen:
        return None

    seen.add(new_bits)

    new_nums = []
    count = 0
    for num in nums:
        if not (num & bit):
            new_nums.append(num)
            count += num_to_count_map[num]
            add_to_freq(freq, num)
    return (count, tpl[1] - 1, {
        "bits": new_bits,
        "freq": arrange_freq(freq),
        "nums": new_nums
    })

def upper_index(a, val, left, right):
    if left == right:
        return left
    mid = (left + right + 1) // 2

    if a[mid][0] > val:
        return upper_index(a, val, left, mid-1)
    else:
        return upper_index(a, val, mid, right)

# tpl = (word_count, letter_count, {bits, freq, nums})
def solve(tpl, k, seen, num_to_count_map, global_best):
    if tpl[1] == k:
        print "cand: %s, %s, %s" % (get_letters(tpl[2]["bits"]), tpl[1], tpl[0])
        if tpl[0] > global_best["count"]:
            global_best["count"] = tpl[0]
            global_best["bits"] = tpl[2]["bits"]

    if tpl[1] > k:
        freq = tpl[2]["freq"]
        start = 1 + upper_index(freq, 0, 0, len(freq))
        for idx in xrange(start, len(freq)):
            if time.time() > global_best["end"]:
                return

            _freq, i = freq[idx]
            new_tpl = get_tuple(tpl, i, k, seen, num_to_count_map)

            # Prune
            if not new_tpl or new_tpl[0] < global_best["count"]:
                continue

            solve(new_tpl, k, seen, num_to_count_map, global_best)

def f(words, k):
    num_to_count_map = defaultdict(lambda: 0)
    nums = set()
    all_bits = 0
    freq = []

    for w in words:
        if len(w) > k:
            continue
        num = as_number_with_duplicates(w)
        nums.add(num)
        num_bits = len(bin(num)[2:])
        if len(freq) < num_bits:
            freq.extend([0] * (num_bits - len(freq)))
        add_to_freq(freq, num)
        num_to_count_map[num] += 1
        all_bits |= num

    tpl = (len(words), bin(all_bits).count("1"), {"bits": all_bits, "freq": arrange_freq(freq), "nums": nums})

    global_best = {
        "count": -float('inf'),
        "bits": 0,
        "end": time.time() + (5*60 if k == 22 else 20)
    }

    solve(tpl, k, set([all_bits]), num_to_count_map, global_best)

    return global_best

def format(w):
    return "".join(sorted(w.lower()))

def are_equal(w1, w2):
    return format(w1) == format(w2)

"""
import os
import datetime
path = "enable.txt"
words = []
with open(path) as file:
    for values in file:
        words.append(values.strip())

start = datetime.datetime.now()
for i in xrange(2, 31):
    result = f(words, i)
    formatted = format(get_letters(result["bits"]))
    print result["count"], formatted
    print are_equal(formatted, btilly[i])
print(datetime.datetime.now() - start)
"""

# Code by btilly

import re
import os
import datetime
path = "enable.txt"
words = []
with open(path) as file:
    for values in file:
        words.append(values.strip().lower())

key_count = {}
for word in words:
    seen = {}
    for letter in word:
        if letter not in seen:
            seen[letter] = 0
        key = (letter, seen[letter])
        if key not in key_count:
            key_count[key] = 1
        else:
            key_count[key] += 1
        seen[letter] += 1


KEYS = sorted(key_count.keys(), key=lambda key: -key_count[key])
#print(KEYS)
#print(len(KEYS))
KEY_POS = {}
for i in range(len(KEYS)):
    KEY_POS[KEYS[i]] = i
    KEY_POS[KEYS[i]] = i

# Now we will build a trie.  Every node has a list of words, and a dictionary
# from the next letter farther in the trie.
# BUT TRICK:, we will map each word to a sequence of numbers, and those numbers
# will be indexes into KEYS.  This allows us to use the fact that a second 'e' is
# unlikely to store that efficiently.
class Trie:
    def __init__(self, path):
        self.words = []
        self.dict = {}
        self.words_len = 0
        self.count_words = 0
        self.path = path

    def add_word (self, word):
        trie = self

        poses = []
        seen = {}
        for letter in word:
            if letter not in seen:
                seen[letter] = 0
            key = (letter, seen[letter])
            poses.append(KEY_POS[(key)])
            seen[letter] += 1
        sorted_poses = sorted(poses);
        for i in range(len(sorted_poses)):
            trie.count_words += 1
            pos = sorted_poses[i]
            if pos not in trie.dict:
                trie.dict[pos] = Trie(trie.path + KEYS[pos][0])
            trie = trie.dict[pos]
        trie.count_words += 1
        trie.words_len += 1
        trie.words.append(word)

    def remove (self, pos):
        trie = Trie(self.path)
        trie.words = None
        trie.dict = self.dict.copy()
        trie.words_len = self.words_len
        trie.count_words = self.count_words
        trie.path = self.path
        if pos in trie.dict:
            trie.count_words -= trie.dict[pos].count_words
            trie.dict.pop(pos)
        return trie

    def merge (self, other):
        trie = Trie(self.path)
        trie.words = None
        trie.words_len = self.words_len + other.words_len
        trie.count_words = self.count_words + other.count_words
        trie.path = self.path
        trie.dict = self.dict.copy()
        for pos, subtrie in other.dict.items():
            if pos in trie.dict:
                trie.dict[pos] = trie.dict[pos].merge(subtrie)
            else:
                trie.dict[pos] = subtrie
        return trie

    def __repr__ (self):
        answer = self.path + ":\n  words_len: " + str(self.words_len) + "\n  count_words:" + str(self.count_words) + " \n  dict:\n"
        def indent (string):
            p = re.compile("^", re.M)
            return p.sub("    ", string)
        for pos in sorted(self.dict.keys()):
            subtrie = self.dict[pos]
            answer = answer + indent(str(pos) + ":\n" + subtrie.__repr__())
        return answer

base_trie = Trie('')
for word in words:
    base_trie.add_word(word)

base_tries = [base_trie]
last_trie = base_trie
for i in range(len(KEYS)):
    subtrie = last_trie.dict[i]
    last_trie = last_trie.remove(i).merge(subtrie)
    base_tries.append(last_trie)

#print(base_tries[1].__repr__())

def best_solution (size):
    def solve (subset, pos, best, partial):
        found = sum(x[0] for x in partial)
        upper_bound = sum(x[1] for x in partial)
        if size <= len(subset) or upper_bound < best or len(KEYS) <= pos:
            return (found, subset)
        if best < found:
            best = found

        # Figure out our next calculations.
        partial_include = []
        partial_exclude = []
        finalized_found = 0
        for this_found, this_bound, this_trie in partial:
            if this_trie is None:
                # This is a generic record of already emptied tries
                finalized_found += this_found
            elif pos in this_trie.dict:
                include_trie = this_trie.dict[pos]
                partial_include.append((
                    this_found + include_trie.words_len,
                    include_trie.count_words + this_found,
                    include_trie
                ))
                # We included the tally of found words in the previous partial.
                # So do not double-count by including it again
                partial_include.append((
                    0,
                    this_bound - include_trie.count_words - this_found,
                    this_trie
                ))
                partial_exclude.append((
                    this_found,
                    this_bound - include_trie.count_words,
                    this_trie
                ))
            elif this_found == this_bound:
                finalized_found += this_found
            else:
                partial_include.append((
                    this_found,
                    this_bound,
                    this_trie
                ))

                partial_exclude.append((
                    this_found,
                    this_bound,
                    this_trie
                ))
        if 0 < finalized_found:
            partial_include.append(
                (finalized_found, finalized_found, None)
            )

            partial_exclude.append(
                (finalized_found, finalized_found, None)
            )

        found_include, subset_include = solve(subset + [pos], pos+1, best, partial_include)
        if best < found_include:
            best = found_include
        found_exclude, subset_exclude = solve(subset, pos+1, best, partial_exclude)
        if found_include < found_exclude:
            return (found_exclude, subset_exclude)
        else:
            return (found_include, subset_include)


    best_count = 0
    best_subset = []
    start_subset = [i for i in range(size)]
    while 0 < len(start_subset):
        trie = base_tries[len(start_subset)].remove(start_subset[-1]+1)
        count, subset = solve(list(start_subset), start_subset[-1]+2, best_count, [(trie.words_len, trie.count_words, trie)])
        if best_count < count:
            best_count = count
            best_subset = subset
        start_subset.pop()
    return ''.join([KEYS[x][0] for x in best_subset])

# End code by btilly

for i in range(25, len(KEYS)):
    start = datetime.datetime.now()
    btilly = best_solution(i)
    print btilly
    print(datetime.datetime.now() - start)
    start = datetime.datetime.now()
    result = f(words, i)
    formatted = format(get_letters(result["bits"]))
    print formatted
    print(datetime.datetime.now() - start)
    print are_equal(btilly, formatted)
    print ""