maxkm's picture
Update README.md
671058b verified
|
raw
history blame
No virus
311 Bytes
metadata
license: mit
language:
  - en
tags:
  - text generation
datasets:
  - fhswf/TinyStoriesV2_cleaned

BPE Tokenizer for TinyStoriesV2

Based on get-neo BPE Tokenizer, but with a smaller vocabulary. Trained with TinyStoriesV2.

  • Vocab Size: 2048
  • 256 Base chars
  • 1 extra Token: <|endoftext|>
  • 1791 merges