Promising looking results on 24GB VRAM folks!

by ubergarm - opened 4 days ago

4 days ago

Good threat with some MMLU-Pro benchmarks over on r/LocalLLaMA: https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/?context=3

I still want to find the "knee point" between the 72B and 32B quants...

ubergarm

3 days ago

•

edited 3 days ago

A summary of Qwen2.5 Models and Parameters performance on MMLU-Pro Computer Science benchmark as submitted by redditors over on u/AaronFeng47 great recent post.

Model Parameters	Quant	File Size (GB)	MMLU-Pro Computer Science	Source
14B	`???`	???	60.49	Additional_test_758
32B	`4bit AWQ`	19.33	75.12	russianguy
32B	`Q4_K_L-iMatrix`	20.43	72.93	AaronFeng47
32B	`Q4_K_M`	18.50	71.46	AaronFeng47
32B	`Q3_K_M`	14.80	72.93	AaronFeng47
32B	`Q3_K_M`	14.80	73.41	VoidAlchemy
32B	`IQ4_XS`	17.70	73.17	soulhacker
72B	`IQ3_XXS`	31.85	77.07	VoidAlchemy
Gemma2-27B-it	`Q8_0`	29.00	58.05	AaronFeng47

References

https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/
https://www.reddit.com/r/LocalLLaMA/comments/1flfh0p/comment/lo7nppj/

bartowski

Owner 3 days ago

Man if we could find a way to distribute MMLU pro testing.. that would be so cool haha.

celsowm

2 days ago

A summary of Qwen2.5 Models and Parameters performance on MMLU-Pro Computer Science benchmark as submitted by redditors over on u/AaronFeng47 great recent post.

Model Parameters Quant File Size (GB) MMLU-Pro Computer Science Source

14B ??? ??? 60.49 Additional_test_758

32B 4bit AWQ 19.33 75.12 russianguy

32B Q4_K_L-iMatrix 20.43 72.93 AaronFeng47

32B Q4_K_M 18.50 71.46 AaronFeng47

32B Q3_K_M 14.80 72.93 AaronFeng47

32B Q3_K_M 14.80 73.41 VoidAlchemy

32B IQ4_XS 17.70 73.17 soulhacker

72B IQ3_XXS 31.85 77.07 VoidAlchemy

Gemma2-27B-it Q8_0 29.00 58.05 AaronFeng47

References

https://www.reddit.com/r/LocalLLaMA/comments/1fkm5vd/comment/lnxcfzg/
https://www.reddit.com/r/LocalLLaMA/comments/1flfh0p/comment/lo7nppj/

Q3 better than Q4 ??? Craziness

ubergarm

1 day ago

•

edited 1 day ago

Q3 better than Q4 ??? Craziness

Yeah, some interesting results for sure. Though, I personally take all these benchmarks with a big grain of salt then throw some salt over my shoulder for good measure too... lol...

It is just a single benchmark with 410 questions that allows random guesses so hard to extrapolate if Q3 would actually be better on your exact data set than Q4 etc.

Fun stuff though, and Ollama-MMLU-Pro makes it easier than ever to try it yourself!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment