olm
/

olm-roberta-base-oct-2022

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

Tristan commited on Dec 20, 2022

Commit

b7e2e18

•

1 Parent(s): 0805050

Update README.md

Files changed (1) hide show

README.md +45 -7

README.md CHANGED Viewed

@@ -16,13 +16,51 @@ This model was created as part of the OLM project, which has the goal of continu
 This is important because we want our models to know about events like COVID or
 a presidential election right after they happen.
-## Intended uses
-You can use the raw model for text generation or fine-tune it to a downstream task.
-## How to use
-TODO
 ## Dataset

 This is important because we want our models to know about events like COVID or
 a presidential election right after they happen.
+## Intended uses & limitations
+You can use the raw model for masked language modeling, but it's mostly intended to
+be fine-tuned on a downstream task, such as sequence classification, token classification or question answering.
+### How to use
+You can use this model directly with a pipeline for masked language modeling:
+```python
+>>> from transformers import pipeline
+>>> unmasker = pipeline('fill-mask', model='olm/olm-roberta-base-oct-2022')
+>>> unmasker("Hello I'm a <mask> model.")
+[{'score': 0.10601336508989334,
+  'token': 2450,
+  'token_str': ' role',
+  'sequence': "Hello I'm a role model."},
+ {'score': 0.05792810395359993,
+  'token': 2677,
+  'token_str': ' former',
+  'sequence': "Hello I'm a former model."},
+ {'score': 0.057744599878787994,
+  'token': 1968,
+  'token_str': ' professional',
+  'sequence': "Hello I'm a professional model."},
+ {'score': 0.029099510982632637,
+  'token': 932,
+  'token_str': ' business',
+  'sequence': "Hello I'm a business model."},
+ {'score': 0.024220379069447517,
+  'token': 1840,
+  'token_str': ' young',
+  'sequence': "Hello I'm a young model."}]
+```
+Here is how to use this model to get the features of a given text in PyTorch:
+```python
+from transformers import AutoTokenizer, RobertaModel
+tokenizer = AutoTokenizer.from_pretrained('olm/olm-roberta-base-oct-2022')
+model = RobertaModel.from_pretrained("olm/olm-roberta-base-oct-2022")
+text = "Replace me by any text you'd like."
+encoded_input = tokenizer(text, return_tensors='pt')
+output = model(**encoded_input)
+```
 ## Dataset