unable to load 4-bit quantized varient with llama.cpp

#31
by sunnykusawa - opened

Getting this error:
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 62.6 GiB for an array with shape (131072, 128256) and data type float32

I am using 4bit quantized LLM so why its expecting 62.6 GiB for an array with shape (131072, 128256) and data type float32

sunnykusawa changed discussion title from unable to laod 4-bit quantized varient with llama.cpp to unable to load 4-bit quantized varient with llama.cpp

Sign up or log in to comment