Aleph-Alpha/tfree-research-vocab-32k-fineweb-steps-370k-instruct-10k

This model is to support our ongoing research T-Free. It is publicly available under the Open Aleph License, a license explicitly allowing for non-commercial research and educational use.

The model was trained on 1 epoch of the fineweb-edu dataset with a sequence length of 4k and 1k batch-size. It was further continued with a llama-style instruction finetuning.

It has an embedding layer of dimension 32k, vocab population of 10 (activations per trigram) and partial lower case overlap of 2 (i.e. 2 of the 10 activations overlap with the trigrams lowercase counterpart). This model aggregates all activations by taking the sum only.

Minimal Example:

# download checkpoint from huggingface
apt-get install git-lfs
git clone https://huggingface.co/Aleph-Alpha/tfree-research-vocab-32k-fineweb-steps-370k-instruct-10k

# install this tfree repository 
git clone https://github.com/Aleph-Alpha/trigrams.git
pip install -e trigrams

# adjust checkpoint path in inference.py, then run it
python inference.py