Background Patient education materials (PEMs) often exceed the American Medical Association’s (AMA) recommended sixth-grade reading grade level (RGL). While artificial intelligence (AI) offers potential for automated text simplification, concerns persist regarding linguistic quality, content fidelity, and the understandability of simplified PEMs by laypeople. Objective This scoping review maps existing evidence on automated language processing technologies for simplifying PEMs for laypeople. Methods Following the Joanna Briggs Institute (JBI) methodology and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guideline, 5 bibliographic databases (Ovid MEDLINE, Embase, CINAHL, PsycInfo, and IEEE Xplore) were systematically searched from 2019 to May 2025, supplemented by reference screening and gray literature searches. Eligible sources were peer-reviewed empirical studies published in English that examined large language models (LLMs), AI-supported writing assistants, AI-based conversational agents, or AI-supported tools designed for automatic text simplification of PEMs. Targeted outcomes included linguistic quality (ie, linguistic comprehensibility, linguistic correctness) and content fidelity (ie, factual accuracy, factual completeness) of simplified PEMs. Excluded sources comprised rule-based systems, manual text simplification, non-laypeople target groups, and technology-focused performance metrics. Results were synthesized via thematic analysis across the domains of targeted outcomes. In accordance with JBI methodology, a risk-of-bias assessment was not performed. Results A total of 31 eligible studies met the inclusion criteria, examining various LLMs, including OpenAI’s GPT series, Gemini, Bard, Claude, Copilot, and Llama. Specifically, GPT-4.0 achieved the most consistent improvements in standardized readability metrics (eg, the Flesch-Kincaid Grade Level FKGL). However, achieving predefined target RGLs remained challenging across all LLMs, particularly at lower RGLs. Findings on content fidelity were inconsistent: despite high content similarity scores, content accuracy was often compromised. Conclusions This is the first scoping review to comprehensively synthesize evidence on automated technologies for text simplification in PEMs. The review identified 2 critical validation gaps. First, no study examined the linguistic correctness (eg, grammar and typographical errors) of automatically simplified PEMs. Second, and most notably, the understandability of the simplified PEMs was assessed exclusively by experts, with no empirical evaluation involving laypeople. Although LLMs effectively reduce text complexity as measured by objective readability metrics, reliance on these formulas represents a critical limitation, as they serve merely as structural proxies. Improvements in readability do not guarantee the maintenance of content accuracy or laypeople’s understandability. Current evidence is further limited by the lack of systematic prompt quality evaluation and the predominant focus on English-language PEMs in US contexts, restricting generalizability. This review provides a foundation for future research by highlighting the need for validated evaluation frameworks that encompass layperson testing and content verification. For clinical practice, LLMs should currently serve as assistive tools, with mandatory expert review remaining essential to verify content fidelity before disseminating LLM-simplified PEMs to laypeople.
Building similarity graph...
Analyzing shared references across papers
Loading...
Krenn et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69fecfcdb9154b0b82876d4e — DOI: https://doi.org/10.2196/88365
Cornelia Krenn
Christine Loder
Natalie Berger
Journal of Medical Internet Research
Building similarity graph...
Analyzing shared references across papers
Loading...