Math Alive

Math Alive

Welcome Page

Problem Set. Error Correction and Compression, Part s2.

You can answer by filling in the blank spaces. If there is not enough space attach other sheets.

In this problem set we will illustrate two compression schemes. The first one is (a very simple version of) the Lempel-Ziv algorithm. This is a scheme that can be used to compress any type of file (text or other), and it can be implemented in many ways, and using many different alphabets. We shall work with normal text and learn about a naive LZ algorithm. We start with some exercises to understand how it works.

Problem 1. Lempel-Ziv Parsing.
The first step in the algorithm is the parsing process. The parsing is explained on the Lempel-Ziv parsing webpage, with a worked example, followed by an example for you to work through where you get an explanation if you have any problems.

Can you parse the following sentence?

can_you_can_a_can_as_a_canner_can_can_a_can?

We have replaced the spaces in the sentence with the "_" character to make it easier to keep track of the spaces and we copied the sentence below, in the Answer so that all you have to do is add the parsing commas.

Answer:

can_you_can_a_can_as_a_canner_can_can_a_can?

Problem 2. Lempel-Ziv Compression.

The next step in the Lempel-Ziv algorithm is the coding of individual strings which is explained on the Lempel-Ziv coding webpage.

The example for which parsing is worked out is:

If_one_doctor_doctors_another_doctor, _does_the_doctor_who_doctors_the_doctor_ doctor_the_doctor_the_way_the_doctor_he_is_doctoring_doctors? _Or_does_he_doctor_the_doctor_the_way_the_doctor_who_doctors_doctors?

which parses as follows:

,I,f,_,o,n,e,_d,oc,t,or,_do,c,to,r,s,_a,no,th,er,_doc,tor,,, _doe,s_,the,_doct,or_,w,h,o_, d,oct,ors,_t,he,_docto,r_, do,ct,or_t,he_,doc,tor_,the_,wa,y,_th,e_,doct,or_h,e_i,s_d, octo,ri,ng,_doctor,s?, _O,r_d,oe,s_h,e_d,octor,_the,_doctor_,the_w,a,y_, the_d,octor_, wh,o_d,octors,_doctors,?

To LZ compress this sentence, we number these strings. The first string, before the first comma, is the empty string (we always start with this), and we give it label 0. We now number all the following strings consecutively. The following table shows the compression process for the first few strings. This last column is the one that is important for us as it contains the LZ compressed sentence. On the webpage, you can practice completing the compression process for this example sentence.

String Position Number
of this string Prefix Position Number
of Prefix Coded String

I 1 empty 0 0I

f 2 empty 0 0f

_ 3 empty 0 0_

o 4 empty 0 0o

n 5 empty 0 0n

e 6 empty 0 0e

_d 7 _ 3 3d

oc 8 o 4 4c

String	Position Number of this string	Prefix	Position Number of Prefix	Coded String
I	1	empty	0	0I
f	2	empty	0	0f
_	3	empty	0	0_
o	4	empty	0	0o
n	5	empty	0	0n
e	6	empty	0	0e
_d	7	_	3	3d
oc	8	o	4	4c

Complete now the following compression table for the sentence you parsed in Problem 1 (we have filled in the first two rows to get you started).

Answer:

String Position Number
of this string Prefix Position Number
of Prefix Coded String

c 1 empty 0 0c

a 2 empty 0 0a

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

Write here the entire compressed sequence.

Answer:

Problem 3. Huffman Coding.

At Generique U. students get the following grades with the following frequencies:

Grade Frequency

A 15% of the time

B 30% Of the time

C 15% of the time

D 5% of the time

P(pass) 20% of the time

F(fail) 5% of the time

I(incomplete) 5% of the time

NA(not assigned) 5% of the time

Historically, the grades are stored into the system using the following code:

Grade Codeword

A 000

B 001

C 010

D 100

P(pass) 011

F(fail) 101

I(incomplete) 110

NA(not assigned) 111

(a) How many bits per grade do you expect the historical code to use (on average)?

(b) Now you decide to use your knowledge of grade frequency for data compression. Design a Huffman code for { A, B, C, D, P, F, I, NA } using the given frequencies. You may draw a labeled tree or write out the code.

Grade	Frequency
A	15% of the time
B	30% Of the time
C	15% of the time
D	5% of the time
P(pass)	20% of the time
F(fail)	5% of the time
I(incomplete)	5% of the time
NA(not assigned)	5% of the time

Grade	Codeword
A	000
B	001
C	010
D	100
P(pass)	011
F(fail)	101
I(incomplete)	110
NA(not assigned)	111