Safeguarded AI is a UK government-sponsored research program for AI safety. Its bold aim is to provide a virtually formal proof of the safety of the targeted AI. In its simplest form, the approach is centered on exhaustive verification. Each input/output pair of the AI is tested to ensure the probability of undesirable behavior falls below a defined threshold. This short article makes a few contributions to AI safety research, primarily focusing on the Safeguarded AI approach. First, we suggest several mathematical gadgets that could enable a computationally efficient brute-force approach for the exhaustive verification above, coupled with a tactic for knowledge elicitation inherently needed to test the socio-technical risks of AI. Second, we discuss several remaining challenges and a caveat. The caveat indicates a concern that open discussion of safety mechanisms could allow rogue AIs or malicious actors to exploit vulnerabilities of the mechanism. As a result, they could circumvent the verification in the Safeguarded AI approach without being noticed. All these discussions are developed based on an abstract model of computerized simulation. The main aim of this article is to indicate the inherent dual-use nature of AI safety science by examining the hard challenges within the Safeguarded AI approach, including the need for secrecy in such research. Disclaimer: Thanks: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The author declares that he has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Also, while the author is a member of Japan's AI Safety Institute, any idea described in this article solely depends on his responsibility and does not reflect the views of any organization he belongs to.
Building similarity graph...
Analyzing shared references across papers
Loading...
Kengo ZENITANI
Building similarity graph...
Analyzing shared references across papers
Loading...
Kengo ZENITANI (Sat,) studied this question.
www.synapsesocial.com/papers/69ddd9f9e195c95cdefd7580 — DOI: https://doi.org/10.5281/zenodo.19532036