将字符串转换为位并将其写入 Java 中的文件中
Converting string to bit and write it in a file in Java
我写了一个关于霍夫曼压缩的程序,最终我可以操纵读取的字符串并将其更改为特定于霍夫曼。例如,对于输入的“abcdeffg”字符串,我可以获得“1001010000010101111011”字符串。我的问题是:我必须将这个字符串写入文件并保存。我也知道文件操作,但我不知道如何把这个字符串写成位。我现在从文件中读取的字符串“abcdeffg”是 8 个字节,而我创建的字符串“1001010000010101111011”是 22 个字节。因此,如果我将其写入文件,我将不会进行压缩。我怎样才能将这个由 0 和 1 组成的字符串写入文件以使 Huffman 压缩工作?我目前编写的代码如下;
Class 霍夫曼编码结果:
class HuffmanEncodedResult {
final Node root;
final String encodedData;
public HuffmanEncodedResult(final String encodedData, final Node root) {
this.encodedData = encodedData;
this.root = root;
}
public Node getRoot() {
return this.root;
}
public String getEncodedData() {
return this.encodedData;
}
}
Class霍夫曼编码器:
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
public class HuffmanEncoder {
private static final int LETTER_SIZE = 256;
public HuffmanEncodedResult compress(final String data){
final int[] freqTable = buildFreqTable(data);
final Node root = buildHuffmanTree(freqTable);
final Map<Character,String> lookupTable = buildLookupTable(root);
return new HuffmanEncodedResult(generateEncodedData(data, lookupTable), root);
}
private static String generateEncodedData(String data, Map<Character, String> lookupTable) {
final StringBuilder builder = new StringBuilder();
for(final char character : data.toCharArray()){
builder.append(lookupTable.get(character));
}
return builder.toString();
}
private static Map<Character,String> buildLookupTable(final Node root){
final Map<Character,String> lookupTable = new HashMap<>();
buildLookupTableImpl(root,"",lookupTable);
return lookupTable;
}
private static void buildLookupTableImpl(Node node, String s, Map<Character, String> lookupTable) {
if(!node.isLeaf()){
buildLookupTableImpl(node.leftChild,s + '0', lookupTable);
buildLookupTableImpl(node.rightChild,s + '1', lookupTable);
}else{
lookupTable.put(node.character,s);
}
}
private static Node buildHuffmanTree(int[] freq){
final PriorityQueue<Node> pq = new PriorityQueue<>();
for(char i = 0; i<LETTER_SIZE; i++){
if(freq[i] > 0){
pq.add(new Node(i,freq[i],null,null));
}
}
if(pq.size() == 1){
pq.add(new Node('[=12=]',0, null, null));
}
while (pq.size() > 1){
final Node left = pq.poll();
final Node right = pq.poll();
assert right != null;
assert left != null;
final Node parent = new Node('[=12=]',left.frequency + right.frequency, left, right);
pq.add(parent);
}
return pq.poll();
}
private static int[] buildFreqTable(final String data){
final int[] freq = new int[LETTER_SIZE];
for(final char character :data.toCharArray()){
freq[character]++;
}
return freq;
}
public String decompress(HuffmanEncodedResult result){
final StringBuilder resultBuilder = new StringBuilder();
Node current = result.getRoot();
int i = 0;
while(i<result.getEncodedData().length()){
while(!current.isLeaf()){
char bit = result.getEncodedData().charAt(i);
if(bit == '1'){
current = current.rightChild;
}else if(bit == '0'){
current = current.leftChild;
}else{
throw new IllegalArgumentException("Invalid Bit in Message! " + bit);
}
i++;
}
resultBuilder.append(current.character);
current = result.getRoot();
}
return resultBuilder.toString();
}
private byte[] toByteArray(String input){
//to charArray
char[] preBitChars = input.toCharArray();
int bitShortage = (8 - (preBitChars.length%8));
char[] bitChars = new char[preBitChars.length + bitShortage];
System.arraycopy(preBitChars, 0, bitChars, 0, preBitChars.length);
for (int i= 0; i < bitShortage; i++) {
bitChars[preBitChars.length + i]='0';
}
//to bytearray
byte[] byteArray = new byte[bitChars.length/8];
for(int i=0; i<bitChars.length; i++) {
if (bitChars[i]=='1'){
byteArray[byteArray.length - (i/8) - 1] |= 1<<(i%8);
}
}
return byteArray;
}
public static void main(String[] args) {
final String test = "abcdeffg";
final HuffmanEncoder encoder = new HuffmanEncoder();
final HuffmanEncodedResult result = encoder.compress(test);
System.out.println("Encoded Message: " + result.encodedData);
System.out.println("Unencoded Message: " + encoder.decompress(result));
}
}
Class节点:
public class Node implements Comparable<Node>{
protected final char character;
protected final int frequency;
protected final Node rightChild;
protected final Node leftChild;
public Node(char character, int frequency, Node leftChild, Node rightChild) {
this.character = character;
this.frequency = frequency;
this.rightChild = rightChild;
this.leftChild = leftChild;
}
public boolean isLeaf(){
return this.leftChild == null && this.rightChild == null;
}
@Override
public int compareTo(Node o) {
final int frequencyComparison = Integer.compare(this.frequency,o.frequency);
if(frequencyComparison != 0){
return frequencyComparison;
}
return Integer.compare(this.character,o.character);
}
}
控制台输出:
"C:\Program Files\Java\jdk-15.0.1\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2020.3.1\lib\idea_rt.jar=62965:C:\Program Files\JetBrains\IntelliJ IDEA 2020.3.1\bin" -Dfile.encoding=UTF-8 -classpath C:\Users\ayber\Desktop\untitled\out\production\untitled HuffmanEncoder
Encoded Message: 1001010000010101111011
Unencoded Message: abcdeffg
Process finished with exit code 0
您需要使用 FileOutputStream.write
将数据写入字节数组,并将字节数组作为 1 和 0 字符串的二进制表示形式。
将字符串转换为位的一种方法是使用 BitSet
.
要创建正确大小的 BitSet,请使用:
BitSet b = new BitSet(string.length());
然后你可以遍历字符串并设置位:
for (int i = 0; i < string.length(); i++) {
if (string.charAt(i) == '1') {
b.set(i);
}
}
然后就可以将bitset转成byte数组写入文件:
byte[] bytes = b.toByteArray();
FileOutputStream f = new FileOutputStream(filename);
f.write(bytes);
但是,我会注意到创建一个字符串然后将其转换为位集然后将其转换为字节数组的效率不是很高。从一开始就使用位集可能会更好。
我写了一个关于霍夫曼压缩的程序,最终我可以操纵读取的字符串并将其更改为特定于霍夫曼。例如,对于输入的“abcdeffg”字符串,我可以获得“1001010000010101111011”字符串。我的问题是:我必须将这个字符串写入文件并保存。我也知道文件操作,但我不知道如何把这个字符串写成位。我现在从文件中读取的字符串“abcdeffg”是 8 个字节,而我创建的字符串“1001010000010101111011”是 22 个字节。因此,如果我将其写入文件,我将不会进行压缩。我怎样才能将这个由 0 和 1 组成的字符串写入文件以使 Huffman 压缩工作?我目前编写的代码如下;
Class 霍夫曼编码结果:
class HuffmanEncodedResult {
final Node root;
final String encodedData;
public HuffmanEncodedResult(final String encodedData, final Node root) {
this.encodedData = encodedData;
this.root = root;
}
public Node getRoot() {
return this.root;
}
public String getEncodedData() {
return this.encodedData;
}
}
Class霍夫曼编码器:
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
public class HuffmanEncoder {
private static final int LETTER_SIZE = 256;
public HuffmanEncodedResult compress(final String data){
final int[] freqTable = buildFreqTable(data);
final Node root = buildHuffmanTree(freqTable);
final Map<Character,String> lookupTable = buildLookupTable(root);
return new HuffmanEncodedResult(generateEncodedData(data, lookupTable), root);
}
private static String generateEncodedData(String data, Map<Character, String> lookupTable) {
final StringBuilder builder = new StringBuilder();
for(final char character : data.toCharArray()){
builder.append(lookupTable.get(character));
}
return builder.toString();
}
private static Map<Character,String> buildLookupTable(final Node root){
final Map<Character,String> lookupTable = new HashMap<>();
buildLookupTableImpl(root,"",lookupTable);
return lookupTable;
}
private static void buildLookupTableImpl(Node node, String s, Map<Character, String> lookupTable) {
if(!node.isLeaf()){
buildLookupTableImpl(node.leftChild,s + '0', lookupTable);
buildLookupTableImpl(node.rightChild,s + '1', lookupTable);
}else{
lookupTable.put(node.character,s);
}
}
private static Node buildHuffmanTree(int[] freq){
final PriorityQueue<Node> pq = new PriorityQueue<>();
for(char i = 0; i<LETTER_SIZE; i++){
if(freq[i] > 0){
pq.add(new Node(i,freq[i],null,null));
}
}
if(pq.size() == 1){
pq.add(new Node('[=12=]',0, null, null));
}
while (pq.size() > 1){
final Node left = pq.poll();
final Node right = pq.poll();
assert right != null;
assert left != null;
final Node parent = new Node('[=12=]',left.frequency + right.frequency, left, right);
pq.add(parent);
}
return pq.poll();
}
private static int[] buildFreqTable(final String data){
final int[] freq = new int[LETTER_SIZE];
for(final char character :data.toCharArray()){
freq[character]++;
}
return freq;
}
public String decompress(HuffmanEncodedResult result){
final StringBuilder resultBuilder = new StringBuilder();
Node current = result.getRoot();
int i = 0;
while(i<result.getEncodedData().length()){
while(!current.isLeaf()){
char bit = result.getEncodedData().charAt(i);
if(bit == '1'){
current = current.rightChild;
}else if(bit == '0'){
current = current.leftChild;
}else{
throw new IllegalArgumentException("Invalid Bit in Message! " + bit);
}
i++;
}
resultBuilder.append(current.character);
current = result.getRoot();
}
return resultBuilder.toString();
}
private byte[] toByteArray(String input){
//to charArray
char[] preBitChars = input.toCharArray();
int bitShortage = (8 - (preBitChars.length%8));
char[] bitChars = new char[preBitChars.length + bitShortage];
System.arraycopy(preBitChars, 0, bitChars, 0, preBitChars.length);
for (int i= 0; i < bitShortage; i++) {
bitChars[preBitChars.length + i]='0';
}
//to bytearray
byte[] byteArray = new byte[bitChars.length/8];
for(int i=0; i<bitChars.length; i++) {
if (bitChars[i]=='1'){
byteArray[byteArray.length - (i/8) - 1] |= 1<<(i%8);
}
}
return byteArray;
}
public static void main(String[] args) {
final String test = "abcdeffg";
final HuffmanEncoder encoder = new HuffmanEncoder();
final HuffmanEncodedResult result = encoder.compress(test);
System.out.println("Encoded Message: " + result.encodedData);
System.out.println("Unencoded Message: " + encoder.decompress(result));
}
}
Class节点:
public class Node implements Comparable<Node>{
protected final char character;
protected final int frequency;
protected final Node rightChild;
protected final Node leftChild;
public Node(char character, int frequency, Node leftChild, Node rightChild) {
this.character = character;
this.frequency = frequency;
this.rightChild = rightChild;
this.leftChild = leftChild;
}
public boolean isLeaf(){
return this.leftChild == null && this.rightChild == null;
}
@Override
public int compareTo(Node o) {
final int frequencyComparison = Integer.compare(this.frequency,o.frequency);
if(frequencyComparison != 0){
return frequencyComparison;
}
return Integer.compare(this.character,o.character);
}
}
控制台输出:
"C:\Program Files\Java\jdk-15.0.1\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2020.3.1\lib\idea_rt.jar=62965:C:\Program Files\JetBrains\IntelliJ IDEA 2020.3.1\bin" -Dfile.encoding=UTF-8 -classpath C:\Users\ayber\Desktop\untitled\out\production\untitled HuffmanEncoder
Encoded Message: 1001010000010101111011
Unencoded Message: abcdeffg
Process finished with exit code 0
您需要使用 FileOutputStream.write
将数据写入字节数组,并将字节数组作为 1 和 0 字符串的二进制表示形式。
将字符串转换为位的一种方法是使用 BitSet
.
要创建正确大小的 BitSet,请使用:
BitSet b = new BitSet(string.length());
然后你可以遍历字符串并设置位:
for (int i = 0; i < string.length(); i++) {
if (string.charAt(i) == '1') {
b.set(i);
}
}
然后就可以将bitset转成byte数组写入文件:
byte[] bytes = b.toByteArray();
FileOutputStream f = new FileOutputStream(filename);
f.write(bytes);
但是,我会注意到创建一个字符串然后将其转换为位集然后将其转换为字节数组的效率不是很高。从一开始就使用位集可能会更好。