Fix invalid characters in template

#1
by Xenova HF staff - opened
from transformers import AutoTokenizer

EXAMPLE_CHAT = [
    { "role": "user", "content": "Hello, how are you?" },
    { "role": "assistant", "content": "I'm doing great. How can I help you today?" },
    { "role": "user", "content": "I'd like to show off how chat templating works!" },
];

tokenizer = AutoTokenizer.from_pretrained("YokaiKoibito/llama2_70b_chat_uncensored-fp16")
prompt=tokenizer.apply_chat_template(EXAMPLE_CHAT, tokenize=False)

results in

TemplateSyntaxError: unexpected char '‘' at 246

This PR should fix that.

You can test it with:

from transformers import AutoTokenizer

EXAMPLE_CHAT = [
    { "role": "user", "content": "Hello, how are you?" },
    { "role": "assistant", "content": "I'm doing great. How can I help you today?" },
    { "role": "user", "content": "I'd like to show off how chat templating works!" },
];

tokenizer = AutoTokenizer.from_pretrained("YokaiKoibito/llama2_70b_chat_uncensored-fp16", revision='refs/pr/1')
prompt=tokenizer.apply_chat_template(EXAMPLE_CHAT, tokenize=False)
print(prompt)

which prints out:

<s>### HUMAN:
Hello, how are you?

### RESPONSE:
I'm doing great. How can I help you today?</s>

### HUMAN:
I'd like to show off how chat templating works!

### RESPONSE:
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment