在 Haskell 中使用最小堆构建哈夫曼树
Building Huffman tree using min-heap in Haskell
我对此有很大的疑问。我不知道如何制作霍夫曼树,因为它是自下而上构建的(从 liefs 到 root)。
我是 Haskell 和函数式编程的新手。我看到有其他帖子与我的相似,但它们对我没有帮助。
这是我的代码
import Data.Map
type Value = Int
type Key = [Char]
type NodeValue = (Key,Value)
data Heap_ a = Empty
| Node a (Heap_ a) (Heap_ a)
deriving(Show, Eq)
type Heap a = Heap_ NodeValue
frequencyOfCharacters :: [Char] -> Map Key Value
frequencyOfCharacters [] = Data.Map.empty
frequencyOfCharacters (character:text) = insertWith (+) [character] 1 (frequencyOfCharacters text)
makeLeaf :: NodeValue -> Heap a
makeLeaf a = Node a Empty Empty
mergeHeaps :: Heap a -> Heap a -> Heap a
mergeHeaps Empty rightHeap = rightHeap
mergeHeaps leftHeap Empty = leftHeap
mergeHeaps leftHeap@(Node a lefta righta) rightHeap@(Node b leftb rightb)
| snd a < snd b = Node a (mergeHeaps lefta rightHeap) righta
| otherwise = Node b leftb (mergeHeaps leftHeap rightb)
addToHeap :: Heap a->NodeValue->Heap a
addToHeap Empty a = makeLeaf a
addToHeap h a = mergeHeaps h (makeLeaf a)
takeHeadFromHeap :: Heap a -> (NodeValue,Heap a)
takeHeadFromHeap Empty = (("",-1), Empty)
takeHeadFromHeap (Node a leftBranch rightBranch) = (a, mergeHeaps leftBranch rightBranch)
makeHeap :: Map Key Value -> Heap a
makeHeap map_ = makeHeap_ $ toList map_
makeHeap_ :: [(Key,Value)] -> Heap a
makeHeap_ [] = Empty
makeHeap_ (x:xs) = addToHeap (makeHeap_ xs) x
huffmanEntry :: [Char]-> Heap a
huffmanEntry text = makeHeap $ frequencyOfCharacters text
我正在考虑哈夫曼树的这个数据结构
data HuffmanTree h = Leaf [Char]
| NodeHuff [Char] (HuffmanTree h) (HuffmanTree h)
deriving(Show, Eq)
但是我不知道如何从最小堆中生成霍夫曼树。
ghci 最小堆中的这行代码由输入字符串组成
*Main> huffmanEntry "Aasdqweqweasd"
你需要做一个最小堆的霍夫曼树,你说 "I have no idea how to make Huffman Tree from min heap"。在开始编码之前,让我们弄清楚您需要做什么,尤其是使用您可能不熟悉的语言。
我想我们应该上网查一下制作霍夫曼树的方法。关于霍夫曼编码的维基百科页面怎么样? (https://en.wikipedia.org/wiki/Huffman_coding)
The simplest construction algorithm uses a priority queue where the
node with lowest probability is given highest priority:
- Create a leaf node for each symbol and add it to the priority queue.
- While there is more than one node in the queue:
- Remove the two nodes of highest priority (lowest probability) from
the queue
- Create a new internal node with these two nodes as
children and with probability equal to the sum of the two nodes'
probabilities.
- Add the new node to the queue.
- The remaining node is the root node and the tree is complete.
您已经有了代码来查找给定字符串中每个符号的频率 - 这就是您的 frequencyOfCharacters
函数。
您现在只需要一个优先队列!你绝对可以找到一种使用最小堆实现优先级队列的方法。
我希望这可以帮助您将逻辑拼凑起来。
如果您想逐步解决问题,为什么不先尝试使用优先级队列的工作实现来制作哈夫曼树 (http://hackage.haskell.org/package/PSQueue)?
完成后,您可以尝试使用最小堆 (http://hackage.haskell.org/package/heap) 的有效实现,将这个现成的模块替换为您自己的小型队列模块。
最后,你可以自己写一个准系统的最小堆模块(你已经有很多代码了)并用它来替换外部堆模块。
更新: 关于如何构建树的一些更具体的建议。这需要一些设置,所以请耐心等待。假设您有一个 Tree.hs
模块允许您使用二叉树:
module Tree where
-- Binary Tree
data Tree k v =
Empty
| Node (k, v) (Tree k v) (Tree k v)
deriving ( Show )
-- takes a (key, value) pair and returns a binary tree
-- containing one node with that pair
singleton :: (k, v) -> Tree k v
singleton = undefined
-- takes three things: a (key, value) pair, a binary tree t1
-- and another binary tree t2
-- then it constructs the tree
-- (key, val)
-- / \
-- t1 t2
joinWith :: (k, v) -> Tree k v -> Tree k v -> Tree k v
joinWith = undefined
-- returns the value associated with the (key, value) pair
-- stored in the root node of the binary tree
value :: Tree k v -> v
value = undefined
并且您还有一个 Queue.hs
模块可以让您使用优先级队列(我假设您有一个工作的最小堆模块)
module Queue where
import Heap
-- a priority queue
type Queue k v = Heap k v
-- returns an empty queue
empty :: (Ord v) => Queue k v
empty = undefined
-- adds a (key, value) pair to the queue and returns a
-- new copy of the queue containing the inserted pair
enqueue :: (Ord v) => (k, v) -> Queue k v -> Queue k v
enqueue = undefined
-- removes the lowest-value (key, value) pair from the queue
-- and returns a tuple consisting of the removed pair
-- and a copy of the queue with the pair removed
dequeue :: (Ord v) => Queue k v -> ((k, v), Queue k v)
dequeue = undefined
-- returns the number of elements in the queue
size :: (Ord v) => Queue k v -> Int
size = undefined
那么这就是您尝试使用您可以使用的工具制作 Huffman.hs
模块的方法。
module Huffman where
import Queue
import Tree
type HuffmanTree = Tree Char Int
-- takes a list of (character, frequency) pairs and turns them into
-- a Huffman Tree
makeHuffmanTree :: [(Char, Int)] -> HuffmanTree
makeHuffmanTree pairs = let
nodeList = map (\pair -> (singleton pair, snd pair)) pairs
nodeQueue = foldr enqueue empty nodeList
in
reduceNodes nodeQueue
-- takes pairs of nodes from the queue and combines them
-- till only one node containing the full Huffman Tree is
-- present in the queue
-- then this last node is dequeued and returned
reduceNodes :: Queue HuffmanTree Int -> HuffmanTree
reduceNodes q
| size q == 0 = error "no nodes!"
| size q == 1 = fst (fst (dequeue q))
| otherwise = let
((tree1, freq1), q') = dequeue q
((tree2, freq2), q'') = dequeue q'
freqSum = freq1 + freq2
newTree = joinWith ('.', freqSum) tree1 tree2
in
reduceNodes (enqueue (newTree, freqSum) q'')
检查类型后,我成功地编译了一个包含这些模块的堆栈项目。当您认为自己拥有所需的哈夫曼树构建代码时,只需将 undefined
函数填入它们实际应该做的事情就可以了!
我对此有很大的疑问。我不知道如何制作霍夫曼树,因为它是自下而上构建的(从 liefs 到 root)。
我是 Haskell 和函数式编程的新手。我看到有其他帖子与我的相似,但它们对我没有帮助。
这是我的代码
import Data.Map
type Value = Int
type Key = [Char]
type NodeValue = (Key,Value)
data Heap_ a = Empty
| Node a (Heap_ a) (Heap_ a)
deriving(Show, Eq)
type Heap a = Heap_ NodeValue
frequencyOfCharacters :: [Char] -> Map Key Value
frequencyOfCharacters [] = Data.Map.empty
frequencyOfCharacters (character:text) = insertWith (+) [character] 1 (frequencyOfCharacters text)
makeLeaf :: NodeValue -> Heap a
makeLeaf a = Node a Empty Empty
mergeHeaps :: Heap a -> Heap a -> Heap a
mergeHeaps Empty rightHeap = rightHeap
mergeHeaps leftHeap Empty = leftHeap
mergeHeaps leftHeap@(Node a lefta righta) rightHeap@(Node b leftb rightb)
| snd a < snd b = Node a (mergeHeaps lefta rightHeap) righta
| otherwise = Node b leftb (mergeHeaps leftHeap rightb)
addToHeap :: Heap a->NodeValue->Heap a
addToHeap Empty a = makeLeaf a
addToHeap h a = mergeHeaps h (makeLeaf a)
takeHeadFromHeap :: Heap a -> (NodeValue,Heap a)
takeHeadFromHeap Empty = (("",-1), Empty)
takeHeadFromHeap (Node a leftBranch rightBranch) = (a, mergeHeaps leftBranch rightBranch)
makeHeap :: Map Key Value -> Heap a
makeHeap map_ = makeHeap_ $ toList map_
makeHeap_ :: [(Key,Value)] -> Heap a
makeHeap_ [] = Empty
makeHeap_ (x:xs) = addToHeap (makeHeap_ xs) x
huffmanEntry :: [Char]-> Heap a
huffmanEntry text = makeHeap $ frequencyOfCharacters text
我正在考虑哈夫曼树的这个数据结构
data HuffmanTree h = Leaf [Char]
| NodeHuff [Char] (HuffmanTree h) (HuffmanTree h)
deriving(Show, Eq)
但是我不知道如何从最小堆中生成霍夫曼树。
ghci 最小堆中的这行代码由输入字符串组成
*Main> huffmanEntry "Aasdqweqweasd"
你需要做一个最小堆的霍夫曼树,你说 "I have no idea how to make Huffman Tree from min heap"。在开始编码之前,让我们弄清楚您需要做什么,尤其是使用您可能不熟悉的语言。
我想我们应该上网查一下制作霍夫曼树的方法。关于霍夫曼编码的维基百科页面怎么样? (https://en.wikipedia.org/wiki/Huffman_coding)
The simplest construction algorithm uses a priority queue where the node with lowest probability is given highest priority:
- Create a leaf node for each symbol and add it to the priority queue.
- While there is more than one node in the queue:
- Remove the two nodes of highest priority (lowest probability) from the queue
- Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
- Add the new node to the queue.
- The remaining node is the root node and the tree is complete.
您已经有了代码来查找给定字符串中每个符号的频率 - 这就是您的 frequencyOfCharacters
函数。
您现在只需要一个优先队列!你绝对可以找到一种使用最小堆实现优先级队列的方法。
我希望这可以帮助您将逻辑拼凑起来。
如果您想逐步解决问题,为什么不先尝试使用优先级队列的工作实现来制作哈夫曼树 (http://hackage.haskell.org/package/PSQueue)?
完成后,您可以尝试使用最小堆 (http://hackage.haskell.org/package/heap) 的有效实现,将这个现成的模块替换为您自己的小型队列模块。
最后,你可以自己写一个准系统的最小堆模块(你已经有很多代码了)并用它来替换外部堆模块。
更新: 关于如何构建树的一些更具体的建议。这需要一些设置,所以请耐心等待。假设您有一个 Tree.hs
模块允许您使用二叉树:
module Tree where
-- Binary Tree
data Tree k v =
Empty
| Node (k, v) (Tree k v) (Tree k v)
deriving ( Show )
-- takes a (key, value) pair and returns a binary tree
-- containing one node with that pair
singleton :: (k, v) -> Tree k v
singleton = undefined
-- takes three things: a (key, value) pair, a binary tree t1
-- and another binary tree t2
-- then it constructs the tree
-- (key, val)
-- / \
-- t1 t2
joinWith :: (k, v) -> Tree k v -> Tree k v -> Tree k v
joinWith = undefined
-- returns the value associated with the (key, value) pair
-- stored in the root node of the binary tree
value :: Tree k v -> v
value = undefined
并且您还有一个 Queue.hs
模块可以让您使用优先级队列(我假设您有一个工作的最小堆模块)
module Queue where
import Heap
-- a priority queue
type Queue k v = Heap k v
-- returns an empty queue
empty :: (Ord v) => Queue k v
empty = undefined
-- adds a (key, value) pair to the queue and returns a
-- new copy of the queue containing the inserted pair
enqueue :: (Ord v) => (k, v) -> Queue k v -> Queue k v
enqueue = undefined
-- removes the lowest-value (key, value) pair from the queue
-- and returns a tuple consisting of the removed pair
-- and a copy of the queue with the pair removed
dequeue :: (Ord v) => Queue k v -> ((k, v), Queue k v)
dequeue = undefined
-- returns the number of elements in the queue
size :: (Ord v) => Queue k v -> Int
size = undefined
那么这就是您尝试使用您可以使用的工具制作 Huffman.hs
模块的方法。
module Huffman where
import Queue
import Tree
type HuffmanTree = Tree Char Int
-- takes a list of (character, frequency) pairs and turns them into
-- a Huffman Tree
makeHuffmanTree :: [(Char, Int)] -> HuffmanTree
makeHuffmanTree pairs = let
nodeList = map (\pair -> (singleton pair, snd pair)) pairs
nodeQueue = foldr enqueue empty nodeList
in
reduceNodes nodeQueue
-- takes pairs of nodes from the queue and combines them
-- till only one node containing the full Huffman Tree is
-- present in the queue
-- then this last node is dequeued and returned
reduceNodes :: Queue HuffmanTree Int -> HuffmanTree
reduceNodes q
| size q == 0 = error "no nodes!"
| size q == 1 = fst (fst (dequeue q))
| otherwise = let
((tree1, freq1), q') = dequeue q
((tree2, freq2), q'') = dequeue q'
freqSum = freq1 + freq2
newTree = joinWith ('.', freqSum) tree1 tree2
in
reduceNodes (enqueue (newTree, freqSum) q'')
检查类型后,我成功地编译了一个包含这些模块的堆栈项目。当您认为自己拥有所需的哈夫曼树构建代码时,只需将 undefined
函数填入它们实际应该做的事情就可以了!