error when loading the model

#1
by StefanStroescu - opened

Hi,

I am trying to load the model using llama.cpp but I am getting an error message:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 197, got 195

Code:
from llama_cpp import Llama

llm_n_gpu_layers = -1
llm_split_mode = 0
llm_main_gpu = 0

llm = Llama(
model_path="./models/phi3-128k/Phi-3-mini-128k-instruct-Q4_K_M.gguf",
n_gpu_layers=llm_n_gpu_layers,
n_ctx=3072,
chat_format="phi-3-chat",
offload_kqv=True,
split_mode=llm_split_mode,
main_gpu=llm_main_gpu)

I can load and use the phi-3-mini-4k in the gguf format (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) but not the 128k version...

Any hint or advice would be very helpful.
Thanks

Sign up or log in to comment