What question did this study set out to answer?

This research aims to develop a hybrid model for optimizing image quality in digital film production using deep learning techniques.

June 1, 2026Open Access

Application of Deep Learning Image Quality Optimization Model in Digital Movie Production

Key Points

This research aims to develop a hybrid model for optimizing image quality in digital film production using deep learning techniques.
Designed a hybrid image quality optimization model called CT-GAN.
Utilized convolutional neural networks for local feature extraction and transformer structures for global context.
Processed RAW format HDR materials captured by ARRI Alexa 65 and RED V-Raptor cameras.
Peak signal-to-noise ratio achieved was 35.2 dB, 23.4% higher than BM3D method.
Structural similarity index reached 0.96, 4.3% higher than pix2pix model.
Average subjective evaluation score by colorists was 4.6 out of 5, with quality retention rates exceeding 96% across different projection systems.

Abstract

Digital film production technology is rapidly developing towards high resolution, high dynamic range, and wide color gamut, which puts forward more stringent standards for optimizing image quality, requiring simultaneous improvement in processing accuracy, efficiency, and multi scene adaptability. In the past, common post production methods relied heavily on manual color correction and filtering techniques, which not only had relatively limited processing efficiency and were easily influenced by subjective preferences, but also commonly had inconsistent effects between different projection terminals. In response to the above challenges, this study designed a hybrid image quality optimization model called CT-GAN. This model combines the advantages of convolutional neural networks in local feature extraction, the modeling ability of Transformer structures for global contextual information, and the role of generative adversarial networks in improving visual perception quality. The experimental part uses RAW format and high dynamic range materials captured by ARRI Alexa 65 and RED V-Raptor cameras. The results show that the model has a peak signal-to-noise ratio of 35.2 decibels when processing 4K resolution single frame images, which is 23.4% higher than the traditional BM3D method; The structural similarity index reached 0.96, which is 4.3% higher than the classic pix2pix model. In terms of operational efficiency, the single frame processing time is only 28 milliseconds, which can meet the needs of real-time preview. In terms of subjective evaluation, five senior colorists were invited to evaluate, with an average score of 4.6 out of 5. In addition, the model exhibits good picture quality consistency in different projection environments, with picture quality retention rates exceeding 96% from IMAX laser cinemas to standard DCI projection systems.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Weiqing Sun (Thu,) studied this question.

synapsesocial.com/papers/6a1d226d02fbce91306381a2 https://doi.org/https://doi.org/10.1016/j.procs.2026.03.387

Bookmark

View Full Paper