What question did this study set out to answer?

The research aims to develop machine learning models that can effectively analyze pathology data and improve clinical diagnostics.

April 12, 2026Open Access

Towards robust generalizable machine learning models for computational pathology

Key Points

The research aims to develop machine learning models that can effectively analyze pathology data and improve clinical diagnostics.
Trained large-scale self-supervised models on 1.2 million whole slide images to learn tissue representations without expert labels
Developed a novel approach for anomaly detection to identify rare diseases in underrepresented training data
Analyzed Clever-Hans effects to mitigate spurious correlations in representation learning.
Improved ability to analyze tissue samples at scale
Enhanced diagnostic precision and inter-observer agreement
Addressed challenges related to data scarcity and imbalanced training data.

Abstract

Machine learning has the potential to revolutionize pathology by enabling automated analysis of tissue samples at scale, discovering novel digital biomarkers, and providing quantitative measurements that enhance diagnostic precision and consistency. These advances could significantly reduce diagnostic turnaround times, improve inter-observer agreement, and support more personalized treatment decisions. Despite significant advances in research studies, the translation of pathology AI systems into routine clinical diagnostics has not yet happened due to several critical challenges for machine learning algorithms. Data scarcity limits the development of robust models due to the expensive and time-consuming nature of expert annotations, while imbalanced training data distributions fail to represent the full spectrum of pathological conditions encountered in clinical practice. Additionally, systematic batch effects arising from differences in staining protocols, scanner types, and institutional practices create domain shifts that compromise model generalizability across different clinical settings. This thesis addresses multiple of these fundamental challenges through three key contributions. First, we train large-scale self-supervised foundation models on up to 1.2 million whole slide images (WSIs) to overcome the lack of annotated data and learn general tissue representations without requiring expert labels. Second, we develop a novel anomaly detection approach that enables the identification of rare diseases, addressing the challenge of detecting pathological conditions that are underrepresented in training data. Third, we systematically analyze Clever-Hans effects in representation learning where models exploit spurious correlations rather than clinically relevant features while providing first strategies for their mitigation. Overall, this thesis provides major building blocks for robustifying AI models for computational pathology and bringing them closer to successful clinical deployment.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jonas Dippel

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Towards robust generalizable machine learning models for computational pathology

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study