统计之前出现的单词数
Count the number of words that appear before
我想问一下,我们如何计算在 trie 中给定字符串之前按字母顺序出现的单词数?
这是我现在的实现。
class TrieNode:
# Trie node class
def __init__(self):
self.children = [None] * 26
# isEndOfWord is True if node represent the end of the word
self.isEndOfWord = False
self.word_count = 0
class Trie:
# Trie data structure class
def __init__(self):
self.root = self.getNode()
def getNode(self):
# Returns new trie node (initialized to NULLs)
return TrieNode()
def _charToIndex(self, ch):
# private helper function
# Converts key current character into index
# use only 'a' through 'z' and lower case
return ord(ch) - ord('a')
def insert(self, key):
# If not present, inserts key into trie
# If the key is prefix of trie node,
# just marks leaf node
pCrawl = self.root
length = len(key)
for level in range(length):
index = self._charToIndex(key[level])
# if current character is not present
if not pCrawl.children[index]:
pCrawl.children[index] = self.getNode()
pCrawl = pCrawl.children[index]
# mark last node as leaf
pCrawl.isEndOfWord = True
pCrawl.word_count += 1
def search(self, key):
# Search key in the trie
# Returns true if key presents
# in trie, else false
pCrawl = self.root
length = len(key)
for level in range(length):
index = self._charToIndex(key[level])
if not pCrawl.children[index]:
return False
pCrawl = pCrawl.children[index]
return pCrawl is not None and pCrawl.isEndOfWord
def count_before(self, string):
cur = self.root
for b in string:
index = self._charToIndex(b)
print(index)
cur = cur.children[index]
if cur is None:
return 0
return cur.word_count
def total_before(text):
t = Trie()
for i in range(len(text)):
t.insert(text[i])
a_list = [] # A list to store the result that occur before the text[i]
for i in range(len(text)):
result = t.count_before(text[i])
a_list.append(result)
return a_list
total_before(["bac", "aaa", "baa", "aac"]) # Output will be [3, 0, 2, 1]
我想知道如何计算我创建的 trie 中给定字符串之前出现的单词数。有人可以给我一个想法吗?
我认为你把问题复杂化了。
def total_before(lst):
return [sorted(lst).index(el) for el in lst]
print(total_before(["bac", "aaa", "baa", "aac"]))
输出:
[3, 0, 2, 1]
由于 word_count
当前已初始化,因此没有太大用处。它仅在 isEndOfWord
设置为 True 的节点处为 non-zero。如果它计算依赖于当前节点的单词数量,即以该节点结尾的单词(您的代码现在计算在内)或继续向下延伸到 trie(当前未计算在内),这将更有用。
为了做到这一点,在下降 trie 的同时增加 word_count
:
def insert(self, key):
pCrawl = self.root
length = len(key)
for level in range(length):
pCrawl.word_count += 1 # <-------------- added
index = self._charToIndex(key[level])
if not pCrawl.children[index]:
pCrawl.children[index] = self.getNode()
pCrawl = pCrawl.children[index]
pCrawl.isEndOfWord = True
pCrawl.word_count += 1
在 count_before
中,您需要对子节点的所有 word_count
值求和 在 子节点之前 select, 因为它们代表当前单词之前的单词:
def count_before(self, string):
count = 0 # used to accumulate the word_counts
cur = self.root
for b in string:
index = self._charToIndex(b)
# add the word counts of the children that are to the left of this index:
count += sum(node.word_count for node in cur.children[:index] if node)
cur = cur.children[index]
if cur is None:
break
return count
这一行:
count += sum(node.word_count for node in cur.children[:index] if node)
这是一种紧凑的方式:
mysum = 0
for node in cur.children[:index]:
if node:
mysum += node.word_count
sum += mysum
我想问一下,我们如何计算在 trie 中给定字符串之前按字母顺序出现的单词数?
这是我现在的实现。
class TrieNode:
# Trie node class
def __init__(self):
self.children = [None] * 26
# isEndOfWord is True if node represent the end of the word
self.isEndOfWord = False
self.word_count = 0
class Trie:
# Trie data structure class
def __init__(self):
self.root = self.getNode()
def getNode(self):
# Returns new trie node (initialized to NULLs)
return TrieNode()
def _charToIndex(self, ch):
# private helper function
# Converts key current character into index
# use only 'a' through 'z' and lower case
return ord(ch) - ord('a')
def insert(self, key):
# If not present, inserts key into trie
# If the key is prefix of trie node,
# just marks leaf node
pCrawl = self.root
length = len(key)
for level in range(length):
index = self._charToIndex(key[level])
# if current character is not present
if not pCrawl.children[index]:
pCrawl.children[index] = self.getNode()
pCrawl = pCrawl.children[index]
# mark last node as leaf
pCrawl.isEndOfWord = True
pCrawl.word_count += 1
def search(self, key):
# Search key in the trie
# Returns true if key presents
# in trie, else false
pCrawl = self.root
length = len(key)
for level in range(length):
index = self._charToIndex(key[level])
if not pCrawl.children[index]:
return False
pCrawl = pCrawl.children[index]
return pCrawl is not None and pCrawl.isEndOfWord
def count_before(self, string):
cur = self.root
for b in string:
index = self._charToIndex(b)
print(index)
cur = cur.children[index]
if cur is None:
return 0
return cur.word_count
def total_before(text):
t = Trie()
for i in range(len(text)):
t.insert(text[i])
a_list = [] # A list to store the result that occur before the text[i]
for i in range(len(text)):
result = t.count_before(text[i])
a_list.append(result)
return a_list
total_before(["bac", "aaa", "baa", "aac"]) # Output will be [3, 0, 2, 1]
我想知道如何计算我创建的 trie 中给定字符串之前出现的单词数。有人可以给我一个想法吗?
我认为你把问题复杂化了。
def total_before(lst):
return [sorted(lst).index(el) for el in lst]
print(total_before(["bac", "aaa", "baa", "aac"]))
输出:
[3, 0, 2, 1]
由于 word_count
当前已初始化,因此没有太大用处。它仅在 isEndOfWord
设置为 True 的节点处为 non-zero。如果它计算依赖于当前节点的单词数量,即以该节点结尾的单词(您的代码现在计算在内)或继续向下延伸到 trie(当前未计算在内),这将更有用。
为了做到这一点,在下降 trie 的同时增加 word_count
:
def insert(self, key):
pCrawl = self.root
length = len(key)
for level in range(length):
pCrawl.word_count += 1 # <-------------- added
index = self._charToIndex(key[level])
if not pCrawl.children[index]:
pCrawl.children[index] = self.getNode()
pCrawl = pCrawl.children[index]
pCrawl.isEndOfWord = True
pCrawl.word_count += 1
在 count_before
中,您需要对子节点的所有 word_count
值求和 在 子节点之前 select, 因为它们代表当前单词之前的单词:
def count_before(self, string):
count = 0 # used to accumulate the word_counts
cur = self.root
for b in string:
index = self._charToIndex(b)
# add the word counts of the children that are to the left of this index:
count += sum(node.word_count for node in cur.children[:index] if node)
cur = cur.children[index]
if cur is None:
break
return count
这一行:
count += sum(node.word_count for node in cur.children[:index] if node)
这是一种紧凑的方式:
mysum = 0
for node in cur.children[:index]:
if node:
mysum += node.word_count
sum += mysum