is it ggufable?

by sopack - opened Dec 23, 2023

Discussion

sopack

Dec 23, 2023

since most us mortals don't have huge VRAMs, it'd be cool to gguf this model as well.

Demonthos

Dec 24, 2023

Yes, here is a converted model: https://huggingface.co/Demonthos/dolphin-2_6-phi-2-candle/blob/main/model-q4k.gguf
And the code to run the model: https://github.com/floneum/floneum/pull/120/files#diff-3397acf5a72f28f207293cb878d25773c9c3f5ab4c4e1fc88eec1a9e9857e033

ehartford

Cognitive Computations org Dec 24, 2023

I wanna see quip sharp version

https://github.com/Cornell-RelaxML/quip-sharp

KnutJaegersberg

Dec 25, 2023

I like the idea of quip# because it would be so tiny that it would work cpu even without ggml, but from what I remember from the paper is that the 2 bit quantization works less well for smaller models. Might still be beter than gptq 3 bit but I think we get more performance with 5-6 bit gguf version.
Note that the bloke just published a gguf version :)

ehartford changed discussion status to closed Mar 5

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment