- tags
- Compression, Complexity
- papers
- (Lempel and Ziv 1976; Ziv and Lempel 1977, 1978; Welch 1984; Storer and Szymanski 1982)

## Context

The LZW algorithm was originally designed as a complexity
(“randomness”) metric for finite sequences (Lempel and Ziv 1976). It was then extended as a
compression algorithm by the same authors to LZ77 (Ziv and Lempel 1977) and LZ78 (Ziv and Lempel 1978). Those last two are the
basis of many well known and widely used compression utilities such
as `GIF`

, `compress`

(LZW (Welch 1984) ) or `DEFLATE`

,
`gzip`

(LZSS (Storer and Szymanski
1982)), etc.

## Description

The base idea of these algorithms is to build a dictionary of
words in the input sequence. A given sequence is scanned and an
empty dictionary is initialized. Each new word is the shortest next
word not in the dictionary. It is registered by using a previous
word and adding a single digit in the form \((j, s_{last})\). This
encoding allows a unique decoding of the original sequence. When
encoding **or** decoding the string the same
dictionary of words is built. For sequences with string
repetitions, the length of each new word increases fast and the
numbers of needed pairs to encode increases very slowly.

Example implementation of the LZ algorithm in Python:

```
def compress(string):
dic = {'0': 0, '1': 1}
word = ""
result = []
for c in string:
wc = word + c
if wc in dic:
word = wc
else:
result.append(dic[word])
dic[wc] = len(dic)
word = c
if word:
result.append(dic[word])
return result
print(compress("1101010101011101010001110100"))
print(compress("1010101010101010101010101010"))
```

Results:

```
[1, 1, 0, 3, 5, 4, 4, 2, 7, 3, 0, 8, 6, 0]
[1, 0, 2, 4, 3, 6, 5, 8, 7, 6]
```

## Bibliography

Lempel, A., and J. Ziv. 1976. “On the
Complexity of Finite Sequences.” *IEEE Transactions on
Information Theory* 22 (1):75–81.

Storer, James A., and Thomas G.
Szymanski. 1982. “Data Compression via Textual Substitution.”
*Journal of the ACM (JACM)* 29 (4):928–51.

Welch. 1984. “A Technique for
High-Performance Data Compression.” *Computer* 17
(6):8–19.

Ziv, J., and A. Lempel. 1977. “A
Universal Algorithm for Sequential Data Compression.” *IEEE
Transactions on Information Theory* 23 (3):337–43.

———. 1978. “Compression of Individual
Sequences via Variable-Rate Coding.” *IEEE Transactions on
Information Theory* 24 (5):530–36.