--- tags: - sentence-transformers - feature-extraction - sentence-similarity - SbertDistil license: apache-2.0 datasets: - wikimedia/wikipedia - SiberiaSoft/SiberianPersonaChat-2 language: - ru - en metrics: - mse library_name: transformers --- # FractalGPT/SbertDistil This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search. This is a fast and small model for solving the problem of determining the proximity between sentences, in the future we will reduce and speed it up. [Project](https://github.com/FractalGPT/ModelEmbedderDistillation) ## Usage (Sentence-Transformers) Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed: * [Run example in Collab](https://colab.research.google.com/drive/1m3fyh632htPs9UiEu4_AkQfrUtjDqIQq) ``` pip install -U sentence-transformers ``` Then you can use the model like this: ```python import numpy as np from sentence_transformers import SentenceTransformer ``` ```python model = SentenceTransformer('FractalGPT/SbertDistil') def cos(x, y): return np.dot(x, y)/(np.linalg.norm(x)*np.linalg.norm(y)) ``` ```python text_1 = "Кто такой большой кот?" text_2 = "Who is kitty?" a = model.encode(text_1) b = model.encode(text_2) cos(a, b) ``` ``` >>> 0.8072159157330788 ``` ## Training * The original weights was taken from [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2). * Training was conducted in two stages: 1. In the first stage, the model was trained on Wikipedia texts (4 million texts) for three epochs.

3. In the second stage, training was conducted on Wikipedia and dialog dataset for one epoch.

## Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 312, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False}) (2): Dense({'in_features': 312, 'out_features': 384, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'}) ) ```