tiiuae
/

falcon-mamba-7b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Gkunsch commited on Jul 18

Commit

39afd30

•

1 Parent(s): 9373216

Update Training Data readme

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -2,6 +2,8 @@
 language:
 - multilingual
 license: apache-2.0
 ---
 # Model Card for Sindibad-7B
@@ -131,7 +133,14 @@ print(tokenizer.decode(outputs[0]))
 ## Training Data
-Guillaume
 ## Training Procedure
 Sindibad-7B was trained on 256 H100 80GB GPUs for the majority of the training, using a 3D parallelism strategy (TP=1, PP=1, DP=256) combined with ZeRO.

 language:
 - multilingual
 license: apache-2.0
+datasets:
+- tiiuae/falcon-refinedweb
 ---
 # Model Card for Sindibad-7B
 ## Training Data
+Falcon-Mamba has been trained with ~ 6,000 GT mainly coming from [Refined-Web](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), a large volume web-only dataset filtered and deduplicated.
+Similar to the others [Falcon](https://huggingface.co/tiiuae/falcon-11B) suite models, Falcon-Mamba has been trained leveraging a multi-stage training strategy to increase the context-length training from 2,048 up to 8,192.
+Note that at inference the context-length is not relevant as the Mamba architecture has no limit on long range dependency.
+At the last training stage, small portion of high-quality curated data was used to further enhance performance.
+Overall, the data sources included RefinedWeb-English, Refined-Multilingual (latin languages), high quality technical data, code data, and conversational data extracted from public sources.
+The data was tokenized with the Falcon-[7B](https://huggingface.co/tiiuae/falcon-7B)/[11B](https://huggingface.co/tiiuae/falcon-11B) tokenizer.
 ## Training Procedure
 Sindibad-7B was trained on 256 H100 80GB GPUs for the majority of the training, using a 3D parallelism strategy (TP=1, PP=1, DP=256) combined with ZeRO.