- tags
- Compression, Complexity
- papers
- (Lempel, Ziv 1976; Ziv, Lempel 1977; Ziv, Lempel 1978; Welch 1984; Storer, Szymanski 1982)

## Context

The LZW algorithm was originally designed as a complexity (“randomness”) metric for finite sequences (Lempel, Ziv 1976). It was then extended as a compression algorithm by the same authors to LZ77 (Ziv, Lempel 1977) and LZ78 (Ziv, Lempel 1978). Those last two are the basis of many well known and widely used compression utilities such as `GIF`

, `compress`

(LZW (Welch 1984) ) or `DEFLATE`

, `gzip`

(LZSS (Storer, Szymanski 1982)), etc.

## Description

The base idea of these algorithms is to build a dictionary of words in the input sequence. A given sequence is scanned and an empty dictionary is initialized. Each new word is the shortest next word not in the dictionary. It is registered by using a previous word and adding a single digit in the form \((j, s_{last})\). This encoding allows a unique decoding of the original sequence. When encoding **or** decoding the string the same dictionary of words is built. For sequences with string repetitions, the length of each new word increases fast and the numbers of needed pairs to encode increases very slowly.

Example implementation of the LZ algorithm in Python:

```
def compress(string):
dic = {'0': 0, '1': 1}
word = ""
result = []
for c in string:
wc = word + c
if wc in dic:
word = wc
else:
result.append(dic[word])
dic[wc] = len(dic)
word = c
if word:
result.append(dic[word])
return result
print(compress("1101010101011101010001110100"))
print(compress("1010101010101010101010101010"))
```

Results:

```
[1, 1, 0, 3, 5, 4, 4, 2, 7, 3, 0, 8, 6, 0]
[1, 0, 2, 4, 3, 6, 5, 8, 7, 6]
```

## Bibliography

- A. Lempel, J. Ziv. . "On the Complexity of Finite Sequences".
*IEEE Transactions on Information Theory*22 (1):75–81. DOI. - J. Ziv, A. Lempel. . "A Universal Algorithm for Sequential Data Compression".
*IEEE Transactions on Information Theory*23 (3):337–43. DOI. - J. Ziv, A. Lempel. . "Compression of Individual Sequences via Variable-rate Coding".
*IEEE Transactions on Information Theory*24 (5):530–36. DOI. - Welch. . "A Technique for High-performance Data Compression".
*Computer*17 (6):8–19. DOI. - James A. Storer, Thomas G. Szymanski. . "Data Compression via Textual Substitution".
*Journal of the ACM (JACM)*29 (4):928–51. DOI.