unable to load 4-bit quantized varient with llama.cpp

#31

by sunnykusawa - opened Jul 24

Jul 24

Getting this error:
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 62.6 GiB for an array with shape (131072, 128256) and data type float32

I am using 4bit quantized LLM so why its expecting 62.6 GiB for an array with shape (131072, 128256) and data type float32

sunnykusawa changed discussion title from unable to laod 4-bit quantized varient with llama.cpp to unable to load 4-bit quantized varient with llama.cpp Jul 24

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment