What question did this study set out to answer?

The aim is to enhance monocular depth estimation at low resolutions with a new efficient architecture.

April 25, 2026Open Access

DeepEchoNet: A Lightweight Architecture for Low Resolution Monocular Depth Estimation

Key Points

The aim is to enhance monocular depth estimation at low resolutions with a new efficient architecture.
Developed DeepEchoNet, combining CNN and transformer elements for efficient processing at 96×96 resolution.
Implemented a joint RGB-depth augmentation pipeline, focusing on low-resolution training objectives.
Utilized multi-scale skip features and efficient recalibration modules for improved output.
DeepEchoNet provides stable training and improved accuracy even at 96×96 inputs.
Achieved significant enhancement in depth map prediction compared to traditional high-resolution models.
Maintained geometric consistency despite challenges posed by low resolution.

Abstract

Monocular depth estimation (MDE) has become a practical alternative to active range sensing in many indoor scenarios, enabled by supervised deep learning models that predict dense depth maps from a single RGB image. However, most modern MDE systems assume mid-to-high resolution inputs and non-trivial compute budgets, limiting their direct applicability in embedded and bandwidth-constrained settings. This paper studies low resolution MDE, focusing on 96×96 inputs, where geometric cues are strongly degraded and naively downsizing high-resolution architectures often leads to unstable training and poor accuracy. We propose DeepEchoNet, a lightweight hybrid CNN-transformer model tailored to operate natively at 96×96 resolution. The design combines a MobileViT-inspired encoder with MobileNetV2-style inverted residual blocks and lightweight transformer blocks, and a guided decoder that selectively fuses multi-scale skip features through efficient recalibration modules and separable convolutions. We further adopt a training objective that is aware of low resolution, along with a joint RGB–depth augmentation pipeline that includes a strong-to-weak schedule, to improve robustness while preserving coarse geometric consistency.

Bookmark

View Full Paper

Bookmark

View Full Paper

DeepEchoNet: A Lightweight Architecture for Low Resolution Monocular Depth Estimation

Key Points

Abstract

Cite This Study