tiiuae
/

falcon-mamba-7b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

yellowvm commited on Jul 24

Commit

507600b

•

1 Parent(s): 58aeedf

Update README.md

Files changed (1) hide show

README.md +20 -0

README.md CHANGED Viewed

@@ -234,6 +234,26 @@ We evaluate our model on all benchmarks of the leaderboard's version 2 using the
 | `falcon2-11B`      | 32.61  | 21.94 |    2.34   | 2.8   | 7.53  | 15.44    |  13.78  |
 | `Mistral-7B-v0.1`  | 23.86  | 22.02 |    2.49   | 5.59  | 10.68 | 22.36    |  14.50  |
 ## Throughput
 This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands:

 | `falcon2-11B`      | 32.61  | 21.94 |    2.34   | 2.8   | 7.53  | 15.44    |  13.78  |
 | `Mistral-7B-v0.1`  | 23.86  | 22.02 |    2.49   | 5.59  | 10.68 | 22.36    |  14.50  |
+64.09 | hellaswag: 80.82 | arc-c: 62.03 | winogrande: 73.64 | truthfulqa: 53.42 | mmlu: 62.11 | gsm8k: 52.54
+| `model name`       |`ARC`|`HellaSwag`|`MMLU`|`Winogrande`|`TruthfulQA`|`GSM8K`|`Average`|
+|:-------------------|:---:|:---------:|:----:|:----------:|:----------:|:-----:|:-------:|
+| ***Pure SSM models***|   |           |      |            |            |       |         |
+| `Falcon-Mamba-7B`  |62.03|   80.82   | 62.11|   73.64    |   53.42    | 52.54 |  64.09  |
+| `mamba1`           |00.00|   00.00   | 00.00|   00.00    |   00.00    | 00.00 |  00.00  |
+| `mamba2`           |00.00|   00.00   | 00.00|   00.00    |   00.00    | 00.00 |  00.00  |
+| `mamba3`           |00.00|   00.00   | 00.00|   00.00    |   00.00    | 00.00 |  00.00  |
+|***Hybrid SSM-attention models***||   |      |            |            |       |         |
+| `hybrid1`          |00.00|   00.00   | 00.00|   00.00    |   00.00    | 00.00 |  00.00  |
+| `hybrid2`          |00.00|   00.00   | 00.00|   00.00    |   00.00    | 00.00 |  00.00  |
+| `hybrid3`          |00.00|   00.00   | 00.00|   00.00    |   00.00    | 00.00 |  00.00  |
+|***Transformer models***| |           |      |            |            |       |         |
+| `Meta-Llama-3-8B`  |00.00|   00.00   | 00.00|   00.00    |   00.00    | 00.00 |  00.00  |
+| `gemma-7B`         |00.00|   00.00   | 00.00|   00.00    |   00.00    | 00.00 |  00.00  |
+| `falcon2-11B`      |00.00|   00.00   | 00.00|   00.00    |   00.00    | 00.00 |  00.00  |
+| `Mistral-7B-v0.1`  |00.00|   00.00   | 00.00|   00.00    |   00.00    | 00.00 |  00.00  |
 ## Throughput
 This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands: