March 3, 2026Open Access

Trusted Yet Flexible: High-Level Runtimes for Secure ML Inference in TEEs

Puntos clave

Python-based ML inference executes entirely within Intel SGX enclaves, enhancing confidentiality and integrity.
The approach shows a modest overhead of approximately 17% for small models compared to a significant 97% for larger workloads.
This implementation enforces standardized model representations such as ONNX, preventing unsafe code execution during deserialization.
Findings support that high-level runtimes within TEEs can maintain developer productivity while ensuring secure machine learning.

Resumen

Machine learning inference is increasingly deployed on shared and cloud infrastructures, where both user inputs and model parameters are highly sensitive. Confidential computing promises to protect these assets using Trusted Execution Environments (TEEs), yet existing TEE-based inference systems remain fundamentally constrained: they rely almost exclusively on low-level, memory-unsafe languages to enforce confinement, sacrificing developer productivity, portability, and access to modern ML ecosystems. At the same time, mainstream high-level runtimes, such as Python, are widely considered incompatible with enclave execution due to their large memory footprints and unsafe model-loading mechanisms that permit arbitrary code execution. To bridge this gap, we present the first Python-based ML inference system that executes entirely inside Intel SGX enclaves while safely supporting untrusted third-party models. Our design enforces standardized, declarative model representations (ONNX), eliminating deserialization-time code execution and confining model behavior through interpreter-mediated execution. The entire inference pipeline (including model loading, execution, and I/O) remains enclave-resident, with cryptographic protection and integrity verification throughout. Our experimental results show that Python incurs modest overheads for small models (≈17%) and outperforms a low-level baseline on larger workloads (97% vs. 265% overhead), demonstrating that enclave-resident high-level runtimes can achieve competitive performances. Overall, our findings indicate that Python-based TEE inference is practical and secure, enabling the deployment of untrusted models with strong confidentiality and integrity guarantees while maintaining developer productivity and ecosystem advantages.

Me gusta

Guardar

Ver artículo completo