Problem with 'google/gemma-2-2b-it''s API for Chat completion

#40

by adelamare-blockchain - opened 6 days ago

6 days ago

Hi !

I am in front of a big problem, while it seems that the API google/gemma-2-2b-it (Official Hugging Face documentation for 'Chat Completion : curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' \ -H "Authorization: Bearer hf_***" \ -H 'Content-Type: application/json' \ -d '{ "model": "google/gemma-2-2b-it", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 500, "stream": false }') is not working for "Chat Completion".
The address 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' points to ```// 20240918223200
// https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions

{
"error": "Model google/gemma-2-2b-it/v1/chat/completions does not exist"
}```.
Which correct API could i use in order to call properly the google/gemma-2-2b-itChat completion please ?

Thx !

GopiUppari

Google org 2 days ago

Hi @adelamare-blockchain ,

I was able to reproduce the issue. To resolve it, please use the following API endpoint: https://api-inference.huggingface.co/models/google/gemma-2-2b-it and refer to the corrected code below:

Thank you.

adelamare-blockchain

2 days ago

Thx @GopiUppari for your answer.

Yeah it works for me this way, anyway it appears that this solution reproduce a 'text-to-text' AI API call.
Unfortunately it doesn't reproduce a 'Chat completion', or a conversation with google/gemma-2-2b-it.

It doesn't accept the "messages": [ { "role": "user", "content": "What is the best approach for integrating AI and blockchain technologies in a decentralized application?" } ], option from { "model": "google/gemma-2-2b-it", "messages": [ { "role": "user", "content": "What is the best approach for integrating AI and blockchain technologies in a decentralized application?" } ], "max_tokens": 500, "temperature": 0.7, "top_p": 0.95, "repetition_penalty": 1.15, "stream": false } body-request pattern.
Indeed, the 'Chat completion' documentation says that curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions' \ -H "Authorization: Bearer hf_***" \ -H 'Content-Type: application/json' \ -d '{ "model": "google/gemma-2-2b-it", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 500, "stream": false }should work, but it didn't due to https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/completions API which doesn't exist.

How can I use the conversationnal API call of google/gemma-2-2b-it please ?

Thx !

GopiUppari

Google org 1 day ago

Hi @adelamare-blockchain ,

In the documentation, passing the chat template format to the tokenizer.apply_chat_template function returns a string format (<class 'str'>) that the model can interpret. You can use this same formatted string in the curl command to ensure the model understands the input correctly.

Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment