What question did this study set out to answer?

This study aims to evaluate the quality and clinical applicability of informed consent forms generated by Gemini 3.0 and ChatGPT 5.0 for ophthalmic procedures in Hindi and Kannada.

March 22, 2026Open Access

Comparative Evaluation of Gemini 3.0- and ChatGPT 5.0-Generated Regional Language Informed Consent Forms in Ophthalmology: A Dual-Rater Study in Hindi and Kannada

Key Points

This study aims to evaluate the quality and clinical applicability of informed consent forms generated by Gemini 3.0 and ChatGPT 5.0 for ophthalmic procedures in Hindi and Kannada.
Conducted a comparative, blinded observational study assessing chatbot-generated consent forms.
Two chatbots were evaluated for five ophthalmic scenarios in Hindi and Kannada.
Four ophthalmologist raters assessed outputs using a 10-point scoring system.
Paired t-tests compared performance, and consistency was evaluated with inter-rater reliability metrics.
Gemini 3.0 showed superior performance and higher mean scores across both languages.
In Hindi, scores were comparable, but rater preferences varied significantly.
In Kannada, Gemini 3.0 significantly outperformed ChatGPT 5.0 (8.8 vs 7.7, p<0.01).
Inter-rater reliability was moderate to good for Gemini 3.0 but lower for ChatGPT 5.0.

Abstract

Purpose: To evaluate the accuracy, linguistic quality, clinical completeness, and real-world applicability of informed consent forms generated in Indian regional languages (Hindi and Kannada) by two large language model-based chatbots for common ophthalmic surgical procedures. Methods: In this comparative, blinded observational study, two chatbots (Gemini 3.0 (Google, California, USA) and ChatGPT 5.0 (OpenAI, California, USA)) were prompted to generate informed consent documents in Hindi and Kannada for five ophthalmic scenarios: cataract surgery, traumatic corneal perforation repair, therapeutic penetrating keratoplasty, orbitotomy, and squint surgery. Outputs were independently assessed by four ophthalmologist raters (two for each language) using a 10-point scoring system based on correctness, completeness, language and readability, clinical relevance, and real-world applicability. Descriptive statistics were calculated. Paired t-tests were used to compare chatbot performance, effect sizes (Cohen’s d) were estimated, and inter-rater reliability was assessed using intraclass correlation coefficients (ICCs). Results: Across both languages, Gemini 3.0 demonstrated more consistent performance and higher combined mean scores. In the Hindi cohort, combined mean scores were comparable between Gemini 3.0 (7.85) and ChatGPT 5.0 (8.00), with significant rater-dependent preference variability. In contrast, in the Kannada cohort, Gemini 3.0 significantly outperformed ChatGPT 5.0 (8.8 vs 7.7, p<0.01), with large to extremely large effect sizes (Cohen’s d: 1.23-3.8). Inter-rater reliability was moderate to good for Gemini 3.0 (ICC: 0.62-0.71) and lower for ChatGPT 5.0 (ICC: 0.38-0.59). ChatGPT 5.0 exhibited frequent grammatical and terminological inaccuracies, particularly in Kannada, affecting clinical usability. Conclusion: Large language models can generate clinically usable informed consent forms in Indian regional languages; however, performance varies significantly between models. Gemini 3.0 demonstrated superior linguistic accuracy, consistency, and clinical suitability. Language-specific validation and mandatory human oversight are essential before clinical implementation.

Bookmark

View Full Paper

Bookmark

View Full Paper

Comparative Evaluation of Gemini 3.0- and ChatGPT 5.0-Generated Regional Language Informed Consent Forms in Ophthalmology: A Dual-Rater Study in Hindi and Kannada

Key Points

Abstract

Cite This Study