Video-based steganography has attracted increasing attention due to its high payload capacity and improved imperceptibility compared to image-based approaches. In this study, a deep learning–based steganographic framework is proposed to embed and recover textual information within video content using the U-Net architecture. Unlike traditional least significant bit (LSB)–based techniques, the proposed method utilizes region-of-interest (ROI) selection and patch-based embedding to enhance robustness and visual quality. Textual data are first encoded into image patches and embedded into selected regions of video frames via a trained hiding network. A corresponding revealing network is employed to recover the hidden information, followed by an optical character recognition (OCR) pipeline for text extraction. Experimental results demonstrate character recovery accuracies between 81% and 88% while preserving high visual fidelity in the stego videos. This ROI-guided U-Net framework provides an effective and scalable solution for secure and imperceptible text hiding in video streams.
Mahmut Sınecen (Fri,) studied this question.