What question did this study set out to answer?

The study aims to develop a tri-modal system for detecting nutritional and lifestyle deficiencies using facial, tongue, and eye images.

April 10, 2026Open Access

TriModal: Face, Tongue, and Eye as Complementary Visual Channels for Non-Invasive Nutritional and Lifestyle Deficiency Screening

Key Points

The study aims to develop a tri-modal system for detecting nutritional and lifestyle deficiencies using facial, tongue, and eye images.
Developed a tri-modal system called TriModal FaceFuel using computer vision techniques.
Created an eye analysis module trained on 2,497 clinically labeled images.
Implemented a LAB color gate to enhance detection accuracy for eye-related deficiencies.
Fusion method combines outputs from facial, tongue, and eye analyses to generate a 16-dimensional output.
Achieved a mean Average Precision (mAP) of 0.913 for the eye module, the highest among all modalities.
Improved face detection mAP from 0.790 to 0.872 due to upgraded YOLO model.
The complete system operates in under 235 ms, demonstrating its efficiency.

Abstract

This paper introduces TriModal FaceFuel, a tri-modal computer vision system that analyzes a selfie photograph and an optional tongue photograph to produce calibrated probability estimates over 16 nutritional and lifestyle deficiency categories. Building on previously published face and tongue pipelines, a new eye analysis module is developed using YOLO11m trained on 2,497 clinically labeled images across three eye feature classes: conjunctival pallor (iron/B12 deficiency), scleral icterus (liver stress), and xanthelasma (cholesterol imbalance). The eye module achieves mAP@0.5 = 0.913 - the highest of any single modality in the system. A LAB color gate prevents false positive scleral icterus detections by verifying genuine yellow pigmentation (B > 145 in LAB space). A three-way weighted product-of-experts fusion combines face (α=0.40), tongue (β=0.35), and eye (γ=0.25) posteriors into a unified 16-dimensional output. The face detector is upgraded from YOLOv8m to YOLO11m, improving mAP from 0.790 to 0.872 (+10.4%). The complete tri-modal system runs in under 235 ms on consumer GPU hardware and introduces cholesterol imbalance as a new eye-exclusive deficiency dimension detectable from a standard selfie photograph.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Abdul Moiz Muhammad

Actions

Institutions

COMSATS University Islamabad

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

TriModal: Face, Tongue, and Eye as Complementary Visual Channels for Non-Invasive Nutritional and Lifestyle Deficiency Screening

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study