File size: 1,645 Bytes
f2d688b
 
 
 
fceeee4
979bb8b
 
 
 
9b949f3
979bb8b
 
9b949f3
979bb8b
9b949f3
 
979bb8b
9b949f3
 
979bb8b
9b949f3
979bb8b
9b949f3
979bb8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: mit
language:
- en
- ru
tags:
- gpt3
- transformers
---

# ruGPT-13B-4bit
This files are GPTQ model files for sberbank [ruGPT-3.5-13B](https://huggingface.co/ai-forever/ruGPT-3.5-13B) model.

## Technical details
Model was quantized to 4-bit with [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) library

## Examples of usage
First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:

GITHUB_ACTIONS=true pip install auto-gptq

Then try the following example code:

```python
from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
repo_name = "gurgutan/ruGPT-13B-4bit"
# load tokenizer from Hugging Face Hub
tokenizer = AutoTokenizer.from_pretrained(repo_name, use_fast=True)
# download quantized model from Hugging Face Hub and load to the first GPU
model = AutoGPTQForCausalLM.from_quantized(repo_name, device="cuda:0", use_safetensors=True, use_triton=False)
# inference with model.generate
request = "Буря мглою небо кроет"
print(tokenizer.decode(model.generate(**tokenizer(request, return_tensors="pt").to(model.device))[0]))
# or you can also use pipeline
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)
print(pipeline(request)[0]["generated_text"])

```
# Original model:  [ruGPT-3.5 13B](https://huggingface.co/ai-forever/ruGPT-3.5-13B)
Language model for Russian. Model has 13B parameters as you can guess from it's name. This is our biggest model so far and it was used for trainig GigaChat (read more about it in the [article](https://habr.com/ru/companies/sberbank/articles/730108/)).