Promising looking results on 24GB VRAM folks!

#3
by ubergarm - opened

Good threat with some MMLU-Pro benchmarks over on r/LocalLLaMA: https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/?context=3

I still want to find the "knee point" between the 72B and 32B quants...

A summary of Qwen2.5 Models and Parameters performance on MMLU-Pro Computer Science benchmark as submitted by redditors over on u/AaronFeng47 great recent post.

Model Parameters Quant File Size (GB) MMLU-Pro Computer Science Source
14B ??? ??? 60.49 Additional_test_758
32B 4bit AWQ 19.33 75.12 russianguy
32B Q4_K_L-iMatrix 20.43 72.93 AaronFeng47
32B Q4_K_M 18.50 71.46 AaronFeng47
32B Q3_K_M 14.80 72.93 AaronFeng47
32B Q3_K_M 14.80 73.41 VoidAlchemy
32B IQ4_XS 17.70 73.17 soulhacker
72B IQ3_XXS 31.85 77.07 VoidAlchemy
Gemma2-27B-it Q8_0 29.00 58.05 AaronFeng47

References

https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/
https://www.reddit.com/r/LocalLLaMA/comments/1flfh0p/comment/lo7nppj/

Man if we could find a way to distribute MMLU pro testing.. that would be so cool haha.

A summary of Qwen2.5 Models and Parameters performance on MMLU-Pro Computer Science benchmark as submitted by redditors over on u/AaronFeng47 great recent post.

Model Parameters Quant File Size (GB) MMLU-Pro Computer Science Source
14B ??? ??? 60.49 Additional_test_758
32B 4bit AWQ 19.33 75.12 russianguy
32B Q4_K_L-iMatrix 20.43 72.93 AaronFeng47
32B Q4_K_M 18.50 71.46 AaronFeng47
32B Q3_K_M 14.80 72.93 AaronFeng47
32B Q3_K_M 14.80 73.41 VoidAlchemy
32B IQ4_XS 17.70 73.17 soulhacker
72B IQ3_XXS 31.85 77.07 VoidAlchemy
Gemma2-27B-it Q8_0 29.00 58.05 AaronFeng47

References

https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/
https://www.reddit.com/r/LocalLLaMA/comments/1flfh0p/comment/lo7nppj/

Q3 better than Q4 ??? Craziness

Q3 better than Q4 ??? Craziness

Yeah, some interesting results for sure. Though, I personally take all these benchmarks with a big grain of salt then throw some salt over my shoulder for good measure too... lol...

It is just a single benchmark with 410 questions that allows random guesses so hard to extrapolate if Q3 would actually be better on your exact data set than Q4 etc.

Fun stuff though, and Ollama-MMLU-Pro makes it easier than ever to try it yourself!

Sign up or log in to comment