Failed to create inference endpoint
Issue:
I cannot start inference endpoint, the log says:
2023/12/07 10:53:21 ~ Error: ShardCannotStart
2023/12/07 10:53:21 ~ {"timestamp":"2023-12-07T01:53:21.369939Z","level":"ERROR","fields":{"message":"Shard 0 failed to start"},"target":"text_generation_launcher"}
2023/12/07 10:53:21 ~ {"timestamp":"2023-12-07T01:53:21.369962Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
Steps for reproduce:Deploy
> Inference Endpoint
> Select A10G AWS instance
Is there a way to use inference endpoint with this lora model?
Thanks in advance!
Hi
@brekk
I am not sure the inference endpoints support Lora, you should consider use the merged model (which I believe is: https://huggingface.co/alignment-handbook/zephyr-7b-sft-full right
@lewtun
?) - if not, you can merge the model yourself, please have a look at: https://huggingface.co/docs/peft/v0.7.0/en/package_reference/lora#peft.LoraModel.merge_and_unload but to merge the lora model you can just:
from peft import AutoPeftModelForCausalLM
merged_model_id = YOUR_NEW_MODEL_ID
model = AutoPeftModelForCausalLM.from_pretrained(peft_model_id)
merged_model = model.merge_and_unload()
merged_model.push_to_hub(YOUR_NEW_MODEL_ID)