Dror Baron - Fast algorithms for lossless data compression


The goal of my research in universal lossless compression was to develop algorithms that are universal to unknown input statistics while being fast.

Universality: Our algorithms achieve coding lengths that asymptotically achieve the entropy rate. For length-n inputs, the redundancy above entropy achieved by our methods is 0.5 log(n) + O(1) bits per unknown parameter (conditional probability). This redundancy is within O(1) bits per parameter of Rissanen's lower bounds on the redundancy of universal coding methods.

Computational efficiency: The total amount of computation performed by our algorithms is close to linear in the length n of the data. Despite this efficiency, the redundancy performance is excellent. To achieve these improvements, we used the Burrows Wheeler transform (BWT), which permutes a block of input symbols in an invertible manner. The interesting feature of the BWT is that its output distribution is similar to piecewise i.i.d.; intuitively, it can be said that the BWT removes memory from a discrete source, in an analogous manner to how the Karhunen Loeve transform (KLT) removes correlation between samples of continuous sources. Specific contributions follow.


Last updated June 2014.