Benchmarks of Radeon 780M iGPU with shared 128GB DDR5 RAM running various MoE models under Llama.cpp

https://www.reddit.com/r/LocalLLaMA/comments/1qaap05/benchmarks_of_radeon_780m_igpu_with_shared_128gb/


Go to LocalLLaMA

r/LocalLLaMA3mo ago

AzerbaijanNyan

 

Benchmarks of Radeon 780M iGPU with shared 128GB DDR5 RAM running various MoE models under Llama.cpp

Other

 

I've been looking for a budget system capable of running the later MoE models for basic one-shot queries. Main goal was finding something energy efficient to keep online 24/7 without racking up an exorbitant electricity bill.

I eventually settled on a refurbished Minisforum UM890 Pro which at the time, September, seemed like the most cost-efficient option for my needs.

 

UM890 Pro

AMD Radeon™ 780M iGPU

128GB DDR5 (Crucial DDR5 RAM 128GB Kit (2x64GB) 5600MHz SODIMM CL46)

2TB M.2

Linux Mint 22.2

ROCm 7.1.1 with HSA_OVERRIDE_GFX_VERSION=11.0.0 override

llama.cpp build: b13771887 (7699)

 

Below are some benchmarks using various MoE models. Llama 7B is included for comparison since there's an ongoing thread gathering data for various AMD cards under ROCm here - Performance of llama.cpp on AMD ROCm (HIP) #15021.

I also tested various Vulkan builds but found it too close in performance to warrant switching to since I'm also testing other ROCm AMD cards on this system over OCulink.

 

llama-bench -ngl 99 -fa 1 -d 0,4096,8192,16384 -m [model]

 

modelsizeparamsbackendnglfatestt/s
llama 7B Q4_03.56 GiB6.74 BROCm991pp512514.88 ± 4.82
llama 7B Q4_03.56 GiB6.74 BROCm991tg12819.27 ± 0.00
llama 7B Q4_03.56 GiB6.74 BROCm991pp512 @ d4096288.95 ± 3.71
llama 7B Q4_03.56 GiB6.74 BROCm991tg128 @ d409611.59 ± 0.00
llama 7B Q4_03.56 GiB6.74 BROCm991pp512 @ d8192183.77 ± 2.49
llama 7B Q4_03.56 GiB6.74 BROCm991tg128 @ d81928.36 ± 0.00
llama 7B Q4_03.56 GiB6.74 BROCm991pp512 @ d16384100.00 ± 1.45
llama 7B Q4_03.56 GiB6.74 BROCm991tg128 @ d163845.49 ± 0.00

 

modelsizeparamsbackendnglfatestt/s
gpt-oss 20B MXFP4 MoE11.27 GiB20.91 BROCm991pp512575.41 ± 8.62
gpt-oss 20B MXFP4 MoE11.27 GiB20.91 BROCm991tg12828.34 ± 0.01
gpt-oss 20B MXFP4 MoE11.27 GiB20.91 BROCm991pp512 @ d4096390.27 ± 5.73
gpt-oss 20B MXFP4 MoE11.27 GiB20.91 BROCm991tg128 @ d409616.25 ± 0.01
gpt-oss 20B MXFP4 MoE11.27 GiB20.91 BROCm991pp512 @ d8192303.25 ± 4.06
gpt-oss 20B MXFP4 MoE11.27 GiB20.91 BROCm991tg128 @ d819210.09 ± 0.00
gpt-oss 20B MXFP4 MoE11.27 GiB20.91 BROCm991pp512 @ d16384210.54 ± 2.23
gpt-oss 20B MXFP4 MoE11.27 GiB20.91 BROCm991tg128 @ d163846.11 ± 0.00

 

modelsizeparamsbackendnglfatestt/s
gpt-oss 120B MXFP4 MoE59.02 GiB116.83 BROCm991pp512217.08 ± 3.58
gpt-oss 120B MXFP4 MoE59.02 GiB116.83 BROCm991tg12820.14 ± 0.01
gpt-oss 120B MXFP4 MoE59.02 GiB116.83 BROCm991pp512 @ d4096174.96 ± 3.57
gpt-oss 120B MXFP4 MoE59.02 GiB116.83 BROCm991tg128 @ d409611.22 ± 0.00
gpt-oss 120B MXFP4 MoE59.02 GiB116.83 BROCm991pp512 @ d8192143.78 ± 1.36
gpt-oss 120B MXFP4 MoE59.02 GiB116.83 BROCm991tg128 @ d81926.88 ± 0.00
gpt-oss 120B MXFP4 MoE59.02 GiB116.83 BROCm991pp512 @ d16384109.48 ± 1.07
gpt-oss 120B MXFP4 MoE59.02 GiB116.83 BROCm991tg128 @ d163844.13 ± 0.00

 

modelsizeparamsbackendnglfatestt/s
qwen3vlmoe 30B.A3B Q6_K23.36 GiB30.53 BROCm991pp512265.07 ± 3.95
qwen3vlmoe 30B.A3B Q6_K23.36 GiB30.53 BROCm991tg12825.83 ± 0.00
qwen3vlmoe 30B.A3B Q6_K23.36 GiB30.53 BROCm991pp512 @ d4096168.86 ± 1.58
qwen3vlmoe 30B.A3B Q6_K23.36 GiB30.53 BROCm991tg128 @ d40966.01 ± 0.00
qwen3vlmoe 30B.A3B Q6_K23.36 GiB30.53 BROCm991pp512 @ d8192124.47 ± 0.68
qwen3vlmoe 30B.A3B Q6_K23.36 GiB30.53 BROCm991tg128 @ d81923.41 ± 0.00
qwen3vlmoe 30B.A3B Q6_K23.36 GiB30.53 BROCm991pp512 @ d1638481.27 ± 0.46
qwen3vlmoe 30B.A3B Q6_K23.36 GiB30.53 BROCm991tg128 @ d163842.10 ± 0.00

 

modelsizeparamsbackendnglfatestt/s
qwen3next 80B.A3B Q6_K63.67 GiB79.67 BROCm991pp512138.44 ± 1.52
qwen3next 80B.A3B Q6_K63.67 GiB79.67 BROCm991tg12812.45 ± 0.00
qwen3next 80B.A3B Q6_K63.67 GiB79.67 BROCm991pp512 @ d4096131.49 ± 1.24
qwen3next 80B.A3B Q6_K63.67 GiB79.67 BROCm991tg128 @ d409610.46 ± 0.00
qwen3next 80B.A3B Q6_K63.67 GiB79.67 BROCm991pp512 @ d8192122.66 ± 1.85
qwen3next 80B.A3B Q6_K63.67 GiB79.67 BROCm991tg128 @ d81928.80 ± 0.00
qwen3next 80B.A3B Q6_K63.67 GiB79.67 BROCm991pp512 @ d16384107.32 ± 1.59
qwen3next 80B.A3B Q6_K63.67 GiB79.67 BROCm991tg128 @ d163846.73 ± 0.00

 

So, am I satisfied with the system? Yes, it performs around what I hoping to. Power draw is 10-13 watt idle with gpt-oss 120B loaded. Inference brings that up to around 75. As an added bonus the system is so silent I had to check so the fan was actually running the first time I started it.

The shared memory means it's possible to run Q8+ quants of many models and the cache at f16+ for higher quality outputs. 120GB something availible also allows having more than one model loaded, personally I've been running Qwen3-VL-30B-A3B-Instruct as a visual assistant for gpt-oss 120B. I found this combo very handy to transcribe hand written letters for translation.

Token generation isn't stellar as expected for a dual channel system but acceptable for MoE one-shots and this is a secondary system that can chug along while I do something else. There's also the option of using one of the two M.2 slots for an OCulink eGPU and increased performance.

Another perk is the portability, at 130mm/126mm/52.3mm it fits easily into a backpack or suitcase.

So, do I recommend this system? Unfortunately no and that's solely due to the current prices of RAM and other hardware. I suspect assembling the system today would cost at least three times as much making the price/performance ratio considerably less appealing.

Disclaimer: I'm not an experienced Linux user so there's likely some performance left on the table.