Text Generation
Transformers
Safetensors
English
falcon_mamba
Eval Results
Inference Endpoints
yellowvm commited on
Commit
507600b
1 Parent(s): 58aeedf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -234,6 +234,26 @@ We evaluate our model on all benchmarks of the leaderboard's version 2 using the
234
  | `falcon2-11B` | 32.61 | 21.94 | 2.34 | 2.8 | 7.53 | 15.44 | 13.78 |
235
  | `Mistral-7B-v0.1` | 23.86 | 22.02 | 2.49 | 5.59 | 10.68 | 22.36 | 14.50 |
236
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
237
  ## Throughput
238
 
239
  This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands:
 
234
  | `falcon2-11B` | 32.61 | 21.94 | 2.34 | 2.8 | 7.53 | 15.44 | 13.78 |
235
  | `Mistral-7B-v0.1` | 23.86 | 22.02 | 2.49 | 5.59 | 10.68 | 22.36 | 14.50 |
236
 
237
+
238
+ 64.09 | hellaswag: 80.82 | arc-c: 62.03 | winogrande: 73.64 | truthfulqa: 53.42 | mmlu: 62.11 | gsm8k: 52.54
239
+
240
+ | `model name` |`ARC`|`HellaSwag`|`MMLU`|`Winogrande`|`TruthfulQA`|`GSM8K`|`Average`|
241
+ |:-------------------|:---:|:---------:|:----:|:----------:|:----------:|:-----:|:-------:|
242
+ | ***Pure SSM models***| | | | | | | |
243
+ | `Falcon-Mamba-7B` |62.03| 80.82 | 62.11| 73.64 | 53.42 | 52.54 | 64.09 |
244
+ | `mamba1` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
245
+ | `mamba2` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
246
+ | `mamba3` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
247
+ |***Hybrid SSM-attention models***|| | | | | | |
248
+ | `hybrid1` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
249
+ | `hybrid2` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
250
+ | `hybrid3` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
251
+ |***Transformer models***| | | | | | | |
252
+ | `Meta-Llama-3-8B` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
253
+ | `gemma-7B` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
254
+ | `falcon2-11B` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
255
+ | `Mistral-7B-v0.1` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
256
+
257
  ## Throughput
258
 
259
  This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands: