This model is to support our ongoing research T-Free. It is publicly available under the Open Aleph License, a license explicitly allowing for non-commercial research and educational use.
The model was trained on 1 epoch of the fineweb-edu dataset with a sequence length of 4k and 1k batch-size.
It has an embedding layer of dimension 32k, vocab population of 10 (activations per trigram) and partial lower case overlap of 2 (i.e. 2 of the 10 activations overlap with the trigrams lowercase counterpart). This model aggregates all activations by taking the sum only.
Minimal Example:
# download checkpoint from huggingface
apt-get install git-lfs
git clone https://huggingface.co/Aleph-Alpha/tfree-research-vocab-32k-fineweb-steps-370k
# install this tfree repository
git clone https://github.com/Aleph-Alpha/trigrams.git
pip install -e trigrams
# adjust checkpoint path in inference.py, then run it
python inference.py