Lempel-Ziv-Welch algorithm

tags
Compression, Complexity
papers
(Lempel and Ziv 1976; Ziv and Lempel 1977, 1978; Welch 1984; Storer and Szymanski 1982)

Context

The LZW algorithm was originally designed as a complexity (“randomness”) metric for finite sequences (Lempel and Ziv 1976). It was then extended as a compression algorithm by the same authors to LZ77 (Ziv and Lempel 1977) and LZ78 (Ziv and Lempel 1978). Those last two are the basis of many well known and widely used compression utilities such as GIF, compress (LZW (Welch 1984) ) or DEFLATE, gzip (LZSS (Storer and Szymanski 1982)), etc.

Description

The base idea of these algorithms is to build a dictionary of words in the input sequence. A given sequence is scanned and an empty dictionary is initialized. Each new word is the shortest next word not in the dictionary. It is registered by using a previous word and adding a single digit in the form \((j, s_{last})\). This encoding allows a unique decoding of the original sequence. When encoding or decoding the string the same dictionary of words is built. For sequences with string repetitions, the length of each new word increases fast and the numbers of needed pairs to encode increases very slowly.

Example implementation of the LZ algorithm:

def compress(string):
    dic = {'0': 0, '1': 1}
    word = ""
    result = []
    for c in string:
        wc = word + c
        if wc in dic:
            word = wc
        else:
            result.append(dic[word])
            dic[wc] = len(dic)
            word = c
    if word:
        result.append(dic[word])
        return result

print(compress("1101010101011101010001110100"))
print(compress("1010101010101010101010101010"))

Results:

[1, 1, 0, 3, 5, 4, 4, 2, 7, 3, 0, 8, 6, 0]
[1, 0, 2, 4, 3, 6, 5, 8, 7, 6]

Bibliography

Lempel, A., and J. Ziv. 1976. “On the Complexity of Finite Sequences.” IEEE Transactions on Information Theory 22 (1):75–81.

Storer, James A., and Thomas G. Szymanski. 1982. “Data Compression via Textual Substitution.” Journal of the ACM (JACM) 29 (4):928–51.

Welch. 1984. “A Technique for High-Performance Data Compression.” Computer 17 (6):8–19.

Ziv, J., and A. Lempel. 1977. “A Universal Algorithm for Sequential Data Compression.” IEEE Transactions on Information Theory 23 (3):337–43.

———. 1978. “Compression of Individual Sequences via Variable-Rate Coding.” IEEE Transactions on Information Theory 24 (5):530–36.


← Back to Notes