What question did this study set out to answer?

The aim is to evaluate the performance of 31 large language models using Vulkan on the AMD BC-250 APU.

April 10, 2026Open Access

Benchmarking 31 Large Language Models on the AMD BC-250: An Empirical Study of Vulkan-Based Inference on a Unified Memory APU

Key Points

The aim is to evaluate the performance of 31 large language models using Vulkan on the AMD BC-250 APU.
Benchmarked 31 large language models with 3B to 35B parameters.
Utilized Vulkan via Mesa RADV driver due to ROCm limitations.
Documented Linux kernel tuning for accessing full memory capacity for inference.
A model with 35 billion parameters achieved 37.5 tokens per second in generation speed.
24 out of 31 models reached a 64K filled-context ceiling during testing.
Silent context truncation was identified, affecting the validity of benchmarks.

Abstract

The AMD BC-250 is a repurposed cryptocurrency mining board with a Cyan Skillfish APU (Zen 2 CPU, GFX1013 GPU, 24 CUs) and 16 GB of unified GDDR6 memory. Because AMD’s ROCm libraries do not support GFX1013, Vulkan via the Mesa RADV driver was the only GPU compute path verified as functional in this work. This study documents the Linux kernel tuning (ttm. pagesₗimit) that enables access to the full 16 GB for inference, and reports benchmark results for 31 large language models (3B–35B parameters) covering generation speed, filled- context scaling with real-token payloads, output quality across five task types, and cold-start latency. A 35-billion-parameter Mixture-of-Experts model with 3B active parameters reaches 37. 5 tok/s. Of the 31 models tested, 24 reach a verified 64K filled-context ceiling. Silent context truncation by the Ollama runtime is documented and shown to affect benchmark validity. As a supplementary evaluation, the same Vulkan path is also used for image generation, vision inference, and video generation, though these modalities were tested with less rigour.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Artur Andrzejczak

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Benchmarking 31 Large Language Models on the AMD BC-250: An Empirical Study of Vulkan-Based Inference on a Unified Memory APU

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study