question about quants

#12
by prudant - opened

this kind of "LLM" for embeddings can be quantized, by example to AWQ o GPTQ format?
regards!

Alibaba-NLP org

Indeed, gte embedding models can be quantized to reduce their computational requirements and memory footprint.

can you give me a little info of how get started with that? wich format, library or useful starting poing please !

i am planning to quantize from 4bytes to 2 bytes so that it is under pgvector's 2k limits. https://jkatz05.com/post/postgres/pgvector-scalar-binary-quantization/
I can report back and see if that works

Sign up or log in to comment