Text Generation
Transformers
Safetensors
English
falcon_mamba
Eval Results
Inference Endpoints
ybelkada commited on
Commit
3cd1fb0
1 Parent(s): 7eb9ef4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md CHANGED
@@ -85,6 +85,48 @@ print(tokenizer.decode(outputs[0]))
85
 
86
  </details>
87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  ### Running the model on a GPU using different precisions
89
 
90
  #### FP16
 
85
 
86
  </details>
87
 
88
+ ### Running the model on a GPU using `torch.compile`
89
+
90
+ <details>
91
+ <summary> Click to expand </summary>
92
+
93
+ ```python
94
+ import torch
95
+ from transformers import AutoTokenizer, AutoModelForCausalLM
96
+
97
+ tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
98
+ model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b", torch_dtype=torch.bfloat16).to(0)
99
+
100
+ model = torch.compile(model)
101
+
102
+ input_text = "Question: How many hours in one day? Answer: "
103
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
104
+
105
+ outputs = model.generate(input_ids)
106
+ print(tokenizer.decode(outputs[0]))
107
+ ```
108
+
109
+ </details>
110
+
111
+ <details>
112
+ <summary> Click to expand </summary>
113
+
114
+ ```python
115
+ # pip install accelerate
116
+ from transformers import AutoTokenizer, AutoModelForCausalLM
117
+
118
+ tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-mamba-7b")
119
+ model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-mamba-7b", device_map="auto")
120
+
121
+ input_text = "Question: How many hours in one day? Answer: "
122
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
123
+
124
+ outputs = model.generate(input_ids)
125
+ print(tokenizer.decode(outputs[0]))
126
+ ```
127
+
128
+ </details>
129
+
130
  ### Running the model on a GPU using different precisions
131
 
132
  #### FP16