CISC 457 - Run Length Encoding

The idea is to encode runs of the same value by specifying only the value and the number of times it appears in the run.

This usually doesn't work well on arbitrary sequences, like

$5\ 1\ 1\ 1\ 0\ 5\ 0\ 7\ 8\ 5\ 0\ 0\ 0\ 2$

which would be encoded as

$1 \times 5, 3 \times 1, 1 \times 0, 1 \times 5, 1 \times 0, 1 \times 7, 1 \times 8, 1 \times 5, 3 \times 0, 1 \times 2$

Using Bit Planes

Instead, we can look at the individual bit-planes to find more repetition. Here are the four bit-planes for the sequence above: $\begin{array}{l|ccccccccccccc} & 5 & 1 & 1 & 1 & 0 & 5 & 0 & 7 & 8 & 5 & 0 & 0 & 2 \\ \hline \textrm{bit } 3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ \textrm{bit } 2 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 \\ \textrm{bit } 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 \\ \textrm{bit } 0 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 \\ \end{array}$

Each plane could be encoded separately, with a special "end of line" marker on each, like this: $\begin{array}{l|ccccccccc} & 0s & 1s & 0s & 1s & 0s & 1s & 0s & 1s & 0s & \\ \hline \textrm{bit } 3 & 8 & 1 & X \\ \textrm{bit } 2 & 0 & 1 & 4 & 1 & 1 & 1 & 1 & 1 & X \\ \textrm{bit } 1 & 7 & 1 & 4 & X \\ \textrm{bit } 0 & 0 & 4 & 1 & 1 & 1 & 1 & 1 & 1 & X \\ \end{array}$

The first row, for example, shows $8 \times 0$ , then $1 \times 1$ , then the remainder of the line is $0$ .

Then the whole stream of tokens is $8\ 1\ X\ 0\ 1\ 4\ 1\ 1\ 1\ 1\ 1\ X\ 7\ 1\ 4\ X\ 0\ 4\ 1\ 1\ 1\ 1\ 1\ 1\ X$

Run length encoding is used in the mapper part of an encoder to reduce redundancy. We can take the stream of tokens and further compress is in the symbol coder part of the encoder using entropy encoding.

For example, if we Huffman-encode the above stream of tokens, we get these codes: $\begin{array}{c|ccccccccc} \textrm{token} & \textrm{encoding} \\ \hline X & 01 \\ 0 & 0001 \\ 1 & 1 \\ 2 \\ 3 \\ 4 & 001 \\ 5 \\ 6 \\ 7 & 00000 \\ 8 & 00001 \\ \end{array}$

And the final encoding is (with spaces added to show the separate symbols): $00001\ 1\ 01\ 0001\ 1\ 001\ 1\ 1\ 1\ 1\ 1\ 01\ 00000\ 1\ 001\ 01\ 0001\ 001\ 1\ 1\ 1\ 1\ 1\ 1\ 01$

The original encoding had 52 bits (4 bpp), while this encoding has 49 (3.77 bpp).

Note that the entropy, $H$ , of the original stream is 2.35 bpp. So we're not achieving very good compression!

Using Grey Codes

Images can often be characterized by gradual changes in big regions, plus places where there are quick changes. This isn't a universal characterization, of course, but the observation leads to another method of run-length encoding.

Grey codes are bit representations of numbers that, unlike binary codes, differ by at most one bit between adjacent numbers. Here, for example, are Grey codes for the numbers 0 to 8: $\begin{array}{c|ccccccccc} \textrm{token} & \textrm{encoding} \\ \hline 0 & 0000 \\ 1 & 0001 \\ 2 & 0011 \\ 3 & 0010 \\ 4 & 0110 \\ 5 & 0111 \\ 6 & 0101 \\ 7 & 0100 \\ 8 & 1100 \\ \end{array}$

Grey codes could be using instead of binary to provide more coherence in each bit plane, but only if values change gradually.