Image Compression using Sum-Product Networks

Abstract

An estimated 79 zettabytes (1021 bytes) of data was generated worldwide in 2021 with even more data expected to be produced in the future. The effective storage and communication of such large amounts of data is an important problem. Data compression lies at the heart of the solution to this issue. The two aspect of data compression — data modeling and coding — are typically jointly designed. As a result, it is difficult to evolve compression standards without a complete modification of the entire architecture. Recently, a model-code separation architecture for compression was proposed with a model-free encoder and model-adaptive decoder. The architecture uses a data independent encoder, and it employs a probabilistic graphical model (PGM) to model the source structure in the decoder. Decoding is performed by running belief propagation over the graphical models representing the modeling and coding aspects of compression. In practical settings where we deal with naturally occurring data, e.g., CIFAR-10 images, the PGM underlying the source data is unknown. Existing structure learning algorithms for PGMs are inefficient for learning from large datasets and place additional constraints on the graphical model structure that diminishes a PGM’s representational power. Due to the difficulty of inference and learning in complex PGMs, the current model-code separation architecture is limited in its use for many real world applications. In this thesis, we develop a new separation architecture based on recently proposed sum-product networks (SPNs), a class of tractable probabilistic generative models, to model the source distribution. Our architecture strikes a balance between efficient learning of source structure and fast lossless decoding. We show that SPNs admit efficient parameter learning via gradient descent to learn statistical structure in synthetic and naturally occurring images. Furthermore, through modifications to the SPN architecture, we describe a procedure to assimilate external beliefs about the source and compute the marginal probabilities of all the source nodes in a single forward and backward pass of the SPN architecture. By using an SPN source model in place of a PGM, we obtain a new model-code separation architecture for compression. Throughout this thesis, we focus on the efficient implementation of our compression architecture. We take advantage of modern deep learning frameworks and GPUs to implement our entire architecture using parallelized tensor operations. As a result, we are able to bridge the gap between traditional statistical inference algorithms and modern deep learning models by carefully developing the SPN source-code belief propagation algorithm for source decoding. The resulting algorithm can decode grayscale sources in under 0.04 seconds. This work applies the proposed architecture for the lossless compression of binary and grayscale images. We compare our architecture against some of the most commonly used compression systems of today and theoretical limits. We show that our architecture achieves a 1.7× gain in compression rate over the state-of-the-art JBIG2 compressor on the binarized MNIST dataset. Furthermore, our architecture does not incur a performance penalty on grayscale sources and is still able to achieve a 1.4× gain in compression rate on the grayscale CIFAR-10 and the Fashion MNIST datasets, as compared against some of the best universal compressors. Extensive analysis on synthetic binary sources show that our architecture can achieve near theoretical limits of compression and match the performance of baseline separation architectures with known PGM structure.

Type
Publication
Massachusetts Institute of Technology