Scientific confidence in New Approach Methodologies (NAMs) depends on transparent and comprehensive documentation. The ToxTemp template, based on OECD Guidance Document 211, standardises reporting for cell-based NAMs. However, completing its 77 questions constitutes a substantial bottleneck. The aim of this study is to introduce ToxTempAssistant, a Large Language Model (LLM)-assisted web tool that supports toxicologists in drafting ToxTemp documents based on user-supplied context documents. This study quantifies the tool’s baseline performance under controlled conditions. ToxTempAssistant uses grounded, per-question prompting with mandatory source attribution. Evaluation paired a positive control (expert-completed ToxTemp documents) with a negative control (out-of-scope documents) across three LLM models (gpt-4.1-nano, gpt-4o-mini, o3-mini). Performance was assessed by classifying model responses as correct or incorrect against reference answers (confusion-matrix framework), using a predefined semantic similarity cut-off to determine agreement (fixed cosine-similarity threshold), from which completeness, precision, specificity, and accuracy were derived. Provided with expert-completed ToxTemps, the ToxTempAssistant reliably reconstructed expert content with comparable semantic fidelity between models. On out-of-scope documents, conservative models (gpt-4.1-nano) minimised false positives, whereas high-coverage models (o3-mini) were more error-prone on confusable texts. LLM models exhibited a coverage-caution trade-off: high-coverage models risked answering out-of-scope, conservative models abstained more, and gpt-4o-mini offered a balance of useful answers and refusals while being cost-effective. Overall accuracy was robust to model choice (∼70%) due to compensating patterns in recall and specificity. Our findings suggest that ToxTempAssistant can use established LLM capabilities in extraction and summarisation to generate ToxTemp drafts. When fully adopted this may shift the toxicologist’s role from manual data collator to expert reviewer, lowering the documentation barrier and potentially facilitating the regulatory uptake of NAMs. Future work will prioritise real-world, centered evaluation (e.g., edit burden, time-to-completion, abstention correctness) before optimisation. LLM-based tools like ToxTempAssistant represent a next step toward bridging scattered research outputs with structured regulatory requirements.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jente Houweling
Matthias M. L. Arras
Egon Willighagen
SHILAP Revista de lepidopterología
Evidence-Based Toxicology
Utrecht University
Maastricht University
National Institute for Public Health and the Environment
Building similarity graph...
Analyzing shared references across papers
Loading...
Houweling et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69d892886c1944d70ce03e0f — DOI: https://doi.org/10.1080/2833373x.2026.2638036