统计之前出现的单词数

Question

我想问一下，我们如何计算在 trie 中给定字符串之前按字母顺序出现的单词数？

这是我现在的实现。

class TrieNode:
    # Trie node class
    def __init__(self):
        self.children = [None] * 26
        # isEndOfWord is True if node represent the end of the word
        self.isEndOfWord = False
        self.word_count = 0

class Trie:
    # Trie data structure class
    def __init__(self):
        self.root = self.getNode()

    def getNode(self):
        # Returns new trie node (initialized to NULLs)
        return TrieNode()

    def _charToIndex(self, ch):
        # private helper function
        # Converts key current character into index
        # use only 'a' through 'z' and lower case
        return ord(ch) - ord('a')

    def insert(self, key):
        # If not present, inserts key into trie
        # If the key is prefix of trie node,
        # just marks leaf node
        pCrawl = self.root
        length = len(key)
        for level in range(length):
            index = self._charToIndex(key[level])
            # if current character is not present
            if not pCrawl.children[index]:
                pCrawl.children[index] = self.getNode()
            pCrawl = pCrawl.children[index]
            # mark last node as leaf
        pCrawl.isEndOfWord = True
        pCrawl.word_count += 1

    def search(self, key):
        # Search key in the trie
        # Returns true if key presents
        # in trie, else false
        pCrawl = self.root
        length = len(key)
        for level in range(length):
            index = self._charToIndex(key[level])
            if not pCrawl.children[index]:
                return False
            pCrawl = pCrawl.children[index]
        return pCrawl is not None and pCrawl.isEndOfWord

    def count_before(self, string):
        cur = self.root
        for b in string:
            index = self._charToIndex(b)
            print(index)
            cur = cur.children[index]
            if cur is None:
                return 0
        return cur.word_count

def total_before(text):
    t = Trie()
    for i in range(len(text)):
        t.insert(text[i])
    
    a_list = [] # A list to store the result that occur before the text[i]
    for i in range(len(text)):
        result = t.count_before(text[i])
        a_list.append(result)
    return a_list

total_before(["bac", "aaa", "baa", "aac"]) # Output will be [3, 0, 2, 1]

我想知道如何计算我创建的 trie 中给定字符串之前出现的单词数。有人可以给我一个想法吗？

Answer 1

我认为你把问题复杂化了。

def total_before(lst):
    return [sorted(lst).index(el) for el in lst]

print(total_before(["bac", "aaa", "baa", "aac"]))

输出：

[3, 0, 2, 1]

Answer 2

由于 word_count 当前已初始化，因此没有太大用处。它仅在 isEndOfWord 设置为 True 的节点处为 non-zero。如果它计算依赖于当前节点的单词数量，即以该节点结尾的单词（您的代码现在计算在内）或继续向下延伸到 trie（当前未计算在内），这将更有用。

为了做到这一点，在下降 trie 的同时增加 word_count：

    def insert(self, key):
        pCrawl = self.root
        length = len(key)
        for level in range(length):
            pCrawl.word_count += 1   # <-------------- added
            index = self._charToIndex(key[level])
            if not pCrawl.children[index]:
                pCrawl.children[index] = self.getNode()
            pCrawl = pCrawl.children[index]
        pCrawl.isEndOfWord = True
        pCrawl.word_count += 1

在 count_before 中，您需要对子节点的所有 word_count 值求和在子节点之前 select, 因为它们代表当前单词之前的单词：

    def count_before(self, string):
        count = 0  # used to accumulate the word_counts
        cur = self.root
        for b in string:
            index = self._charToIndex(b)
            # add the word counts of the children that are to the left of this index:
            count += sum(node.word_count for node in cur.children[:index] if node)
            cur = cur.children[index]
            if cur is None:
                break
        return count

这一行：

count += sum(node.word_count for node in cur.children[:index] if node)

这是一种紧凑的方式：

mysum = 0
for node in cur.children[:index]:
    if node:
        mysum += node.word_count
sum += mysum

统计之前出现的单词数

Count the number of words that appear before

python

tree

trie