Promising looking results on 24GB VRAM folks!
Good threat with some MMLU-Pro benchmarks over on r/LocalLLaMA
: https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/?context=3
I still want to find the "knee point" between the 72B and 32B quants...
A summary of Qwen2.5 Models and Parameters performance on MMLU-Pro Computer Science benchmark as submitted by redditors over on u/AaronFeng47 great recent post.
Model Parameters | Quant | File Size (GB) | MMLU-Pro Computer Science | Source |
---|---|---|---|---|
14B | ??? |
??? | 60.49 | Additional_test_758 |
32B | 4bit AWQ |
19.33 | 75.12 | russianguy |
32B | Q4_K_L-iMatrix |
20.43 | 72.93 | AaronFeng47 |
32B | Q4_K_M |
18.50 | 71.46 | AaronFeng47 |
32B | Q3_K_M |
14.80 | 72.93 | AaronFeng47 |
32B | Q3_K_M |
14.80 | 73.41 | VoidAlchemy |
32B | IQ4_XS |
17.70 | 73.17 | soulhacker |
72B | IQ3_XXS |
31.85 | 77.07 | VoidAlchemy |
Gemma2-27B-it | Q8_0 |
29.00 | 58.05 | AaronFeng47 |
References
https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/
https://www.reddit.com/r/LocalLLaMA/comments/1flfh0p/comment/lo7nppj/
Man if we could find a way to distribute MMLU pro testing.. that would be so cool haha.
A summary of Qwen2.5 Models and Parameters performance on MMLU-Pro Computer Science benchmark as submitted by redditors over on u/AaronFeng47 great recent post.
Model Parameters Quant File Size (GB) MMLU-Pro Computer Science Source 14B ???
??? 60.49 Additional_test_758 32B 4bit AWQ
19.33 75.12 russianguy 32B Q4_K_L-iMatrix
20.43 72.93 AaronFeng47 32B Q4_K_M
18.50 71.46 AaronFeng47 32B Q3_K_M
14.80 72.93 AaronFeng47 32B Q3_K_M
14.80 73.41 VoidAlchemy 32B IQ4_XS
17.70 73.17 soulhacker 72B IQ3_XXS
31.85 77.07 VoidAlchemy Gemma2-27B-it Q8_0
29.00 58.05 AaronFeng47 References
https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/
https://www.reddit.com/r/LocalLLaMA/comments/1flfh0p/comment/lo7nppj/
Q3 better than Q4 ??? Craziness
Q3 better than Q4 ??? Craziness
Yeah, some interesting results for sure. Though, I personally take all these benchmarks with a big grain of salt then throw some salt over my shoulder for good measure too... lol...
It is just a single benchmark with 410 questions that allows random guesses so hard to extrapolate if Q3 would actually be better on your exact data set than Q4 etc.
Fun stuff though, and Ollama-MMLU-Pro makes it easier than ever to try it yourself!