What question did this study set out to answer?

This research aims to benchmark the GPT-SoVITS text-to-speech system on Apple Silicon hardware, identifying key performance metrics and issues.

April 10, 2026Open Access

Benchmarking Real-Time Voice Cloning on Consumer Apple Silicon: A Practical Evaluation of GPT-SoVITS on M-Series Hardware

Key Points

This research aims to benchmark the GPT-SoVITS text-to-speech system on Apple Silicon hardware, identifying key performance metrics and issues.
Conducted benchmarking of GPT-SoVITS on consumer Apple Silicon.
Resolved seven platform incompatibilities including float16 precision errors.
Fine-tuned a voice model using 37 minutes of speech data in approximately 70 minutes.
Measured end-to-end latency for real-time voice generation.
Achieved 1.5-second end-to-end latency for real-time voice agent.
Identified and fixed critical compatibility issues.
Demonstrated practical performance capabilities on MacBook Pro M4 Pro.

Abstract

We present the first systematic benchmark of GPT-SoVITS, an open-source few-shot text-to-speech system, running entirely on consumer Apple Silicon hardware. We identify and resolve seven critical platform incompatibilities, including pervasive float16 precision errors. Using a MacBook Pro M4 Pro (24GB), we fine-tune a voice model on 37 minutes of speech data in ~70 minutes and achieve 1.5-second end-to-end latency for a real-time voice agent. Our complete toolkit is released as open source. Code: https://github.com/akhilsingh-git/voice-clone-toolkit

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Akhil Singh

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Benchmarking Real-Time Voice Cloning on Consumer Apple Silicon: A Practical Evaluation of GPT-SoVITS on M-Series Hardware

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider