In the contemporary world, there is an increasing demand for image and video data. The storage and transmission of such voluminous amounts of data has proven to be a significant challenge. Given the likelihood of continued growth in this trend, the development of new methods for compressing image and video data has become imperative. Recent advancements in neural network-based compression techniques offer potential solutions to this problem. It has been demonstrated that neural networks possess the capacity to learn nonlinear transformations, complex probability distributions, and various other components that are essential for image and video compression. The utilization of neural networks facilitates the direct acquisition of image properties from image and video data, thereby enabling the development of methods for their integration within the encoder. In this thesis, we propose a novel neural network-based image compression framework and a set of variable bitrate solutions designed for this framework. In addition, the aforementioned solutions have been integrated into the JPEG AI standard platform. The superiority of the technology has been validated through subjective visual experiments. A preliminary investigation into the human visual system reveals that humans exhibit heightened sensitivity to luminance relative to chrominance. The distribution of spatial pixels in images also reveals a high correlation between chrominance and luminance distributions. It is hypothesized that a conditional color separation compression framework capable of processing luminance and chrominance in parallel will emerge as a result of the observations outlined above. This framework employs a more complex neural network for luminance, ensuring quality, while assigning a less complex neural network to chrominance to reduce computational complexity. In order to maintain the reconstruction quality of chrominance, we employ luminance as auxiliary information for chrominance. This framework has been demonstrated to achieve a 42% reduction in complexity, with a concomitant loss of only 4% in objective quality. The framework’s adoption by JPEG AI is indicative of its noteworthy balance between complexity and performance. Subsequent to the aforementioned steps, we implemented a three-dimensional quality map of equivalent dimensions to the latent tensor within the proposed framework. This approach was undertaken to enable the model to operate with variable rates. The map can be derived by expanding two maps with different functionalities. The channel-wise quality map provides overall rate control, and the spatial quality map enhances the region of interest. We also proposed an efficient bit rate matching algorithm that helps our model reach the target rate within a short time and preserve the variable rate performance. Our efficient bitrate matching technology can enhance computational speed by 4-6 times without any loss. Finally, we integrated variable bitrate technology into our model to provide accurate bitrate matching. We conducted subjective visual experiments alongside other encoders, and the results showed that our model offers a general visual enhancement at the same bitrate.
Panqi Jia (Thu,) studied this question.