What question did this study set out to answer?

The aim is to improve the reliability of AI systems, making them trustworthy and accurate in real-world applications.

April 12, 2026Open Access

Teach AI What It Doesn't Know

Key Points

The aim is to improve the reliability of AI systems, making them trustworthy and accurate in real-world applications.
Developed algorithms for unknown-aware learning with minimal human input.
Introduced automated outlier generation and decision boundary regularization.
Established methods for leveraging unlabeled data for OOD detection and generalization.
Designed frameworks to detect hallucinations and defend against malicious prompts in AI models.
Achieved state-of-the-art performance in OOD detection under various conditions.
Enhanced interpretability and decision-making reliability of AI systems.
Provided theoretical guarantees for the reliability frameworks developed.

Abstract

Abstract AI systems are rapidly transitioning from laboratory demonstrations to decision‐making technologies deployed in high‐stakes domains. Yet reliability remains a primary obstacle to responsible adoption: discriminative models can be confidently wrong under out‐of‐distribution (OOD) inputs, and foundation models (FMs) such as large language models (LLMs) can generate fluent but untruthful, harmful, or misaligned outputs. My research develops the foundations of reliable machine learning with minimal human supervision , unifying algorithms, and theory that make reliability a first‐class objective alongside accuracy. I advance unknown‐aware learning through automated outlier generation, introducing feature‐ and input‐space synthesis frameworks that regularize decision boundaries and improve interpretability. I further establish principled methods for learning “in the wild” by leveraging unlabeled deployment data under mixture and contamination models, with theoretical guarantees and state‐of‐the‐art performance for OOD detection and generalization under diverse shifts. Finally, I design reliability frameworks for FMs by exploiting unlabeled signals to detect hallucinations, defend against malicious prompts in vision–language models, and denoise noisy preference data for more dependable alignment. Collectively, these contributions provide a cohesive toolkit for deploying AI systems that remain accurate, calibrated, and trustworthy in open‐world environments.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Sean Du

Journals

AI Magazine

Actions

Institutions

Nanyang Technological University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Teach AI What It Doesn't Know

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider