Problem with 'google/gemma-2-2b-it''s API for Chat completion

#40
by adelamare-blockchain - opened

Hi !

I am in front of a big problem, while it seems that the API google/gemma-2-2b-it (Official Hugging Face documentation for 'Chat Completion : curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' \ -H "Authorization: Bearer hf_***" \ -H 'Content-Type: application/json' \ -d '{ "model": "google/gemma-2-2b-it", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 500, "stream": false }') is not working for "Chat Completion".
The address 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' points to ```// 20240918223200
// https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions

{
"error": "Model google/gemma-2-2b-it/v1/chat/completions does not exist"
}```.
Which correct API could i use in order to call properly the google/gemma-2-2b-itChat completion please ?

Thx !

Google org

Hi @adelamare-blockchain ,

I was able to reproduce the issue. To resolve it, please use the following API endpoint: https://api-inference.huggingface.co/models/google/gemma-2-2b-it and refer to the corrected code below:

image.png

Thank you.

Thx @GopiUppari for your answer.

Yeah it works for me this way, anyway it appears that this solution reproduce a 'text-to-text' AI API call.
Unfortunately it doesn't reproduce a 'Chat completion', or a conversation with google/gemma-2-2b-it.

It doesn't accept the "messages": [ { "role": "user", "content": "What is the best approach for integrating AI and blockchain technologies in a decentralized application?" } ], option from { "model": "google/gemma-2-2b-it", "messages": [ { "role": "user", "content": "What is the best approach for integrating AI and blockchain technologies in a decentralized application?" } ], "max_tokens": 500, "temperature": 0.7, "top_p": 0.95, "repetition_penalty": 1.15, "stream": false } body-request pattern.
Indeed, the 'Chat completion' documentation says that curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' \ -H "Authorization: Bearer hf_***" \ -H 'Content-Type: application/json' \ -d '{ "model": "google/gemma-2-2b-it", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 500, "stream": false }should work, but it didn't due to https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions API which doesn't exist.

How can I use the conversationnal API call of google/gemma-2-2b-it please ?

Thx !

Google org

Hi @adelamare-blockchain ,

image.png

In the documentation, passing the chat template format to the tokenizer.apply_chat_template function returns a string format (<class 'str'>) that the model can interpret. You can use this same formatted string in the curl command to ensure the model understands the input correctly.

image.png

Thank you.

Sign up or log in to comment