Huffman Coding

From bibbleWiki
Jump to navigation Jump to search

Introduction

A quick overview of huffman - bet I never read this again. This is a lossless compression algorithm and works by assigning a code to each byte which is smaller than the original byte. He achieves this by counting the occurrence of characters and assigning a code smaller than the source data e.g. a = 0 b = 01. You need to store the translation so you know when you see 01 it is a b and a 0 is a.

Approach

Our example will be "AABCBAD"

  • Count the frequency of each unique character
  • Identify the two least used at characters
  • Add nodes in order of lowest frequency to the parent node
  • Assign the bits, by convention left is 0
  • Assign codes to letters
  • Write out example using the codes

Huffman basically

Count the frequency of each unique character

This gives

A = 3
B = 2
C = 1
D = 1

Identify the two least used at characters.

This is clearly c and d. We make these the bottom nodes of our tree adding a parent node and placing the sum of the occurrences in the node

Add nodes in order of lowest frequency to the parent node

B is the lowest frequency to this gets added to the tree and parent node is added which is the sum of the children 4

Now the remaining A node.

Assign the bits, by convention left is 0

I have nothing to say about this picture right now

Assign codes to letters

We do this be writing down the edges writing the values. Perhaps C is the best example. 7->4 = 0 4->2 = 1 2->C = 0 = 010. So this gives a table of

A = 3 | 1
B = 2 | 00
C = 1 | 010
D = 1 | 011

Which you will notice the more frequent the smaller the number.

Write out example using the codes

Should be like this (no spaces obviously)

A  A  B   C   B   A  D
1  1  00  010 00  1  011

And that it done with the value 1100010001011.

Build a JavaScript Frequency Table

const buildFreqTable = (byteStream) => {
  const freq = new Map();

  for (const byte of byteStream) {
    freq.set(byte, (freq.get(byte) || 0) + 1);
  }

  return freq;
}

const bytes = new Uint8Array([1, 2, 2, 3, 255, 0, 2, 255]);
const freqTable = buildFreqTable(bytes);

for (const [byte, count] of freqTable.entries()) {
  console.log(`Byte ${byte}: ${count}`);
}