霍夫曼编码的时间和 space 复杂度是多少?
What is the time and space complexity of Huffman encoding?
我已经通过使用(两个)Hashmaps 来存储每个唯一字符(存储为 hashmaps 中的键)频率和代码(存储为 hashmaps 中的值)来实现 Huffman 编码算法。我不确定如何确定时间和 space 复杂性,谁能帮我找出并解释这些是什么?
我的一些想法:
每个唯一字符在每个Hashmap中存储一次,所以space复杂度O(2*n)=O(n)?
我读到时间复杂度为 O(nlogn)(对于某些解决方案)但不明白为什么。
首先我从一个字符串构建一个频率图:
/**
* builds a list of the occurring characters in a text and
* counts the frequency of these characters
*/
public void buildFrequencyMap(String string) {
frequencyMap = new HashMap<Character, Integer>(); //key, value
char[] stringArray = string.toCharArray();
for(char c : stringArray) {
if(frequencyMap.containsKey(c)) { //if the character has not been stored yet, store it
frequencyMap.put(c, frequencyMap.get(c) + 1);
} else {
frequencyMap.put(c, 1);
}
}
}
然后我构建树:
/**
* builds the huffman tree
*/
public Node buildTree() {
PriorityQueue<Node> queue = new PriorityQueue<Node>();
//fill the queue with nodes constructed from a character and its frequency
for(char i = 0; i < 256; i++ ) { //256 - size of ASCII alphabet
if(frequencyMap.containsKey(i) && frequencyMap.get(i) > 0) {
queue.add(new Node(i, frequencyMap.get(i), null, null)); //create new leaf
}
}
//if the indata only consists of 1 single character
if(queue.size() == 1) {
queue.add(new Node('[=12=]', 1, null, null));
}
//otherwise
//continuously merge nodes (two with the lowest frequency) to build the tree until only one node remains --> becomes the root
while(queue.size() > 1) {
Node left = queue.poll(); //first extracted child becomes the left child
Node right = queue.poll(); //second extracted child becomes the right child
Node parent = new Node('[=12=]', (left.frequency + right.frequency), left, right);
queue.add(parent);
}
return root = queue.poll(); //the remaining node in the queue becomes the root
}
最后,codemap构建完成:
/**
* builds the codemap
*/
public void buildCodeMap() {
codeMap = new HashMap<Character, String>(); //key, value
buildCode(root, "", codeMap);
}
public void buildCode(Node node, String code, HashMap<Character, String> codeMap) {
if(!node.isLeaf()) { //if the current node is NOT a leaf
buildCode(node.leftChild, code + '0', codeMap); //each time we go down at the LEFT side of the tree, encode a 0
buildCode(node.rightChild, code + '1', codeMap); //each time we go down at the RIGHT side of the tree, encode a 1
} else { //otherwise
codeMap.put(node.character, code);
}
}
霍夫曼编码需要 O(n log n) 时间,除非频率已经排序,在这种情况下它需要 O( n) 次。 n是被编码的符号数。
请注意,插入、查找最小值或从优先级队列中删除它的操作中至少有一个是 O(log n)。哪一个取决于优先级队列的实现。
我已经通过使用(两个)Hashmaps 来存储每个唯一字符(存储为 hashmaps 中的键)频率和代码(存储为 hashmaps 中的值)来实现 Huffman 编码算法。我不确定如何确定时间和 space 复杂性,谁能帮我找出并解释这些是什么?
我的一些想法: 每个唯一字符在每个Hashmap中存储一次,所以space复杂度O(2*n)=O(n)?
我读到时间复杂度为 O(nlogn)(对于某些解决方案)但不明白为什么。
首先我从一个字符串构建一个频率图:
/**
* builds a list of the occurring characters in a text and
* counts the frequency of these characters
*/
public void buildFrequencyMap(String string) {
frequencyMap = new HashMap<Character, Integer>(); //key, value
char[] stringArray = string.toCharArray();
for(char c : stringArray) {
if(frequencyMap.containsKey(c)) { //if the character has not been stored yet, store it
frequencyMap.put(c, frequencyMap.get(c) + 1);
} else {
frequencyMap.put(c, 1);
}
}
}
然后我构建树:
/**
* builds the huffman tree
*/
public Node buildTree() {
PriorityQueue<Node> queue = new PriorityQueue<Node>();
//fill the queue with nodes constructed from a character and its frequency
for(char i = 0; i < 256; i++ ) { //256 - size of ASCII alphabet
if(frequencyMap.containsKey(i) && frequencyMap.get(i) > 0) {
queue.add(new Node(i, frequencyMap.get(i), null, null)); //create new leaf
}
}
//if the indata only consists of 1 single character
if(queue.size() == 1) {
queue.add(new Node('[=12=]', 1, null, null));
}
//otherwise
//continuously merge nodes (two with the lowest frequency) to build the tree until only one node remains --> becomes the root
while(queue.size() > 1) {
Node left = queue.poll(); //first extracted child becomes the left child
Node right = queue.poll(); //second extracted child becomes the right child
Node parent = new Node('[=12=]', (left.frequency + right.frequency), left, right);
queue.add(parent);
}
return root = queue.poll(); //the remaining node in the queue becomes the root
}
最后,codemap构建完成:
/**
* builds the codemap
*/
public void buildCodeMap() {
codeMap = new HashMap<Character, String>(); //key, value
buildCode(root, "", codeMap);
}
public void buildCode(Node node, String code, HashMap<Character, String> codeMap) {
if(!node.isLeaf()) { //if the current node is NOT a leaf
buildCode(node.leftChild, code + '0', codeMap); //each time we go down at the LEFT side of the tree, encode a 0
buildCode(node.rightChild, code + '1', codeMap); //each time we go down at the RIGHT side of the tree, encode a 1
} else { //otherwise
codeMap.put(node.character, code);
}
}
霍夫曼编码需要 O(n log n) 时间,除非频率已经排序,在这种情况下它需要 O( n) 次。 n是被编码的符号数。
请注意,插入、查找最小值或从优先级队列中删除它的操作中至少有一个是 O(log n)。哪一个取决于优先级队列的实现。