What question did this study set out to answer?

The study aims to introduce ToxTempAssistant, a tool that aids toxicologists in efficiently generating ToxTemp documents.

April 10, 2026Open Access

ToxTempAssistant : using large language models to standardise cell-based toxicological test method descriptions

Key Points

The study aims to introduce ToxTempAssistant, a tool that aids toxicologists in efficiently generating ToxTemp documents.
Developed ToxTempAssistant based on OECD Guidance Document 211 for NAMs reporting.
Evaluated performance under controlled conditions with expert and out-of-scope document controls.
Assessed responses using confusion-matrix methodology to evaluate correctness and semantic similarity.
ToxTempAssistant achieved approximately 70% overall accuracy across three LLM models.
Conservative models minimized false positives while high-coverage models were more error-prone.
Semantic fidelity was comparable across models, indicating reliable expert content reconstruction.

Abstract

Scientific confidence in New Approach Methodologies (NAMs) depends on transparent and comprehensive documentation. The ToxTemp template, based on OECD Guidance Document 211, standardises reporting for cell-based NAMs. However, completing its 77 questions constitutes a substantial bottleneck. The aim of this study is to introduce ToxTempAssistant, a Large Language Model (LLM)-assisted web tool that supports toxicologists in drafting ToxTemp documents based on user-supplied context documents. This study quantifies the tool’s baseline performance under controlled conditions. ToxTempAssistant uses grounded, per-question prompting with mandatory source attribution. Evaluation paired a positive control (expert-completed ToxTemp documents) with a negative control (out-of-scope documents) across three LLM models (gpt-4.1-nano, gpt-4o-mini, o3-mini). Performance was assessed by classifying model responses as correct or incorrect against reference answers (confusion-matrix framework), using a predefined semantic similarity cut-off to determine agreement (fixed cosine-similarity threshold), from which completeness, precision, specificity, and accuracy were derived. Provided with expert-completed ToxTemps, the ToxTempAssistant reliably reconstructed expert content with comparable semantic fidelity between models. On out-of-scope documents, conservative models (gpt-4.1-nano) minimised false positives, whereas high-coverage models (o3-mini) were more error-prone on confusable texts. LLM models exhibited a coverage-caution trade-off: high-coverage models risked answering out-of-scope, conservative models abstained more, and gpt-4o-mini offered a balance of useful answers and refusals while being cost-effective. Overall accuracy was robust to model choice (∼70%) due to compensating patterns in recall and specificity. Our findings suggest that ToxTempAssistant can use established LLM capabilities in extraction and summarisation to generate ToxTemp drafts. When fully adopted this may shift the toxicologist’s role from manual data collator to expert reviewer, lowering the documentation barrier and potentially facilitating the regulatory uptake of NAMs. Future work will prioritise real-world, centered evaluation (e.g., edit burden, time-to-completion, abstention correctness) before optimisation. LLM-based tools like ToxTempAssistant represent a next step toward bridging scattered research outputs with structured regulatory requirements.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jente Houweling

Matthias M. L. Arras

Egon Willighagen

Journals

SHILAP Revista de lepidopterología

Evidence-Based Toxicology

Actions

Institutions

Utrecht University

Maastricht University

National Institute for Public Health and the Environment

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

ToxTempAssistant : using large language models to standardise cell-based toxicological test method descriptions

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study