Machine learning inference is increasingly deployed on shared and cloud infrastructures, where both user inputs and model parameters are highly sensitive. Confidential computing promises to protect these assets using Trusted Execution Environments (TEEs), yet existing TEE-based inference systems remain fundamentally constrained: they rely almost exclusively on low-level, memory-unsafe languages to enforce confinement, sacrificing developer productivity, portability, and access to modern ML ecosystems. At the same time, mainstream high-level runtimes, such as Python, are widely considered incompatible with enclave execution due to their large memory footprints and unsafe model-loading mechanisms that permit arbitrary code execution. To bridge this gap, we present the first Python-based ML inference system that executes entirely inside Intel SGX enclaves while safely supporting untrusted third-party models. Our design enforces standardized, declarative model representations (ONNX), eliminating deserialization-time code execution and confining model behavior through interpreter-mediated execution. The entire inference pipeline (including model loading, execution, and I/O) remains enclave-resident, with cryptographic protection and integrity verification throughout. Our experimental results show that Python incurs modest overheads for small models (≈17%) and outperforms a low-level baseline on larger workloads (97% vs. 265% overhead), demonstrating that enclave-resident high-level runtimes can achieve competitive performances. Overall, our findings indicate that Python-based TEE inference is practical and secure, enabling the deployment of untrusted models with strong confidentiality and integrity guarantees while maintaining developer productivity and ecosystem advantages.
Steiakakis et al. (Tue,) studied this question.