batch tokenizer？

#106

by jjplane - opened Jan 23

Discussion

jjplane

Jan 23

•

edited Jan 23

model_input = tokenizer(batch_messages, return_tensors="pt",padding=True, truncation=True).to("cuda")

rasie ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'}).

ybelkada

Jan 23

Hi @jjplane
Can you try to set a pad token as suggested by the error message?

jjplane

Jan 23

Hi @jjplane
Can you try to set a pad token as suggested by the error message?

ok, tokenizer.pad_token = tokenizer.eos_token worked, and <\s> is padded on the left. thx!

jjplane changed discussion status to closed Jan 23

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment