Purpose: To evaluate the accuracy, linguistic quality, clinical completeness, and real-world applicability of informed consent forms generated in Indian regional languages (Hindi and Kannada) by two large language model-based chatbots for common ophthalmic surgical procedures. Methods: In this comparative, blinded observational study, two chatbots (Gemini 3.0 (Google, California, USA) and ChatGPT 5.0 (OpenAI, California, USA)) were prompted to generate informed consent documents in Hindi and Kannada for five ophthalmic scenarios: cataract surgery, traumatic corneal perforation repair, therapeutic penetrating keratoplasty, orbitotomy, and squint surgery. Outputs were independently assessed by four ophthalmologist raters (two for each language) using a 10-point scoring system based on correctness, completeness, language and readability, clinical relevance, and real-world applicability. Descriptive statistics were calculated. Paired t-tests were used to compare chatbot performance, effect sizes (Cohen’s d) were estimated, and inter-rater reliability was assessed using intraclass correlation coefficients (ICCs). Results: Across both languages, Gemini 3.0 demonstrated more consistent performance and higher combined mean scores. In the Hindi cohort, combined mean scores were comparable between Gemini 3.0 (7.85) and ChatGPT 5.0 (8.00), with significant rater-dependent preference variability. In contrast, in the Kannada cohort, Gemini 3.0 significantly outperformed ChatGPT 5.0 (8.8 vs 7.7, p<0.01), with large to extremely large effect sizes (Cohen’s d: 1.23-3.8). Inter-rater reliability was moderate to good for Gemini 3.0 (ICC: 0.62-0.71) and lower for ChatGPT 5.0 (ICC: 0.38-0.59). ChatGPT 5.0 exhibited frequent grammatical and terminological inaccuracies, particularly in Kannada, affecting clinical usability. Conclusion: Large language models can generate clinically usable informed consent forms in Indian regional languages; however, performance varies significantly between models. Gemini 3.0 demonstrated superior linguistic accuracy, consistency, and clinical suitability. Language-specific validation and mandatory human oversight are essential before clinical implementation.
Das et al. (Fri,) studied this question.