Aleph-Alpha
/

tfree-research-vocab-32k-fineweb-steps-370k-instruct-10k

bashFish commited on 27 days ago

Commit

a167acd

•

1 Parent(s): 4140f1b

adding checkpoint

Files changed (7) hide show

00000370000_instruct_000000010000/tokenizer.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:fad7016d158905c099e607b52c707b4f019fed531d95c6235a201758f6e4ba05
+size 3718892766

00000370000_instruct_000000010000/tokenizer/tokenizer.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c7ae5ef18db0fc7b1168b6d95bbbe7d8b57a0dd0dd80cb772d8015a197673f4
+size 2607973210

00000370000_instruct_000000010000/tokenizer/tokenizer_config.yaml ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee361247e7b839df69b83b6f42c4971f869304f8f877c991c270e1567abbd7c6
+size 384

00000370000_instruct_000000010000/tokenizer_config.yaml ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee361247e7b839df69b83b6f42c4971f869304f8f877c991c270e1567abbd7c6
+size 384

00000370000_instruct_000000010000/transformer/transformer.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:4c378f7056e0b061ed4571fde57ef7524d138c7849ef0a185875e4fa899d6f69
+size 13413983204

00000370000_instruct_000000010000/transformer/transformer_config.yaml ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e8484ba14f7c37a7dc3ace6396e2bd824bd06e8e6d5ef6d732b83b1ef28e103
+size 483

README.md CHANGED Viewed

@@ -3,3 +3,10 @@ license: other
 license_name: open-aleph-license
 license_link: https://github.com/Aleph-Alpha/.github/blob/main/oal.pdf
 ---

 license_name: open-aleph-license
 license_link: https://github.com/Aleph-Alpha/.github/blob/main/oal.pdf
 ---
+This model is to support our ongoing research [T-Free](https://github.com/Aleph-Alpha/trigrams).
+It is publicly available under the Open Aleph License, a license explicitly allowing for non-commercial research and educational use.
+The model was trained on 1 epoch of the [fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) dataset with a sequence length of 4k and 1k batch-size. It was further continued with a llama-style instruction finetuning.
+It has an embedding layer of dimension 32k, vocab population of 10 (activations per trigram) and partial lower case overlap of 2 (i.e. 2 of the 10 activations overlap with the trigrams lowercase counterpart). This model aggregates all activations by taking the sum only.