Paper 8 of the MCH Research Program demonstrated ~90% Encoding Fidelity Index (EFI) degradation for Kannada, Tamil, and Hindi clinical input relative to English, measured using standard sentence embedding models (MiniLM, MPNet). This paper asks whether that finding is an artefact of measurement instrument choice. Using the same 15-sentence clinical battery across 6 embedding models ranging from monolingual-distilled (all-MiniLM-L6-v2) to language-agnostic (LaBSE), we find: (1) EFI is dramatically embedding-dependent — Kannada EFI ranges from 0.081 (MiniLM) to 0.853 (LaBSE), a 10× difference for identical input; (2) the Dravidian-specific EFI gap from Paper 8 nearly disappears under LaBSE (Indic–European gap: MiniLM = 0.33, LaBSE = 0.035; 90% reduction), indicating near script-invariance with the appropriate measurement tool; (3) Indic-specific masked language models (MuRIL, IndicBERTv2) are degenerate as sentence encoders — effective rank 1.42/10 versus 9.88/10 for MiniLM — and cannot measure EFI; any claim that "Indic models fix encoding" based on raw MLM embeddings is methodologically invalid; (4) critically, variance amplification persists across ALL five non-degenerate embedding models (Kannada VR > 1.0, p < 0.05), confirming it is LLM-intrinsic rather than a measurement artefact; (5) EFI and variance ratio are statistically independent under LaBSE (r = -0.18, p = 0.73), confirming they measure distinct phenomena with different causes. The central conclusion: LaBSE fixes the measured EFI gap (tokenizer/embedding problem, partially addressable); LaBSE does not fix variance amplification (LLM behaviour problem, requiring different intervention). These findings have direct implications for methodology in multilingual clinical AI evaluation.
Building similarity graph...
Analyzing shared references across papers
Loading...
M M LAXMAN
Government Dental College & Research Institute
Building similarity graph...
Analyzing shared references across papers
Loading...
M M LAXMAN (Wed,) studied this question.
www.synapsesocial.com/papers/69d8967d6c1944d70ce07e45 — DOI: https://doi.org/10.5281/zenodo.19466612