Tristan commited on
Commit
b7e2e18
1 Parent(s): 0805050

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -7
README.md CHANGED
@@ -16,13 +16,51 @@ This model was created as part of the OLM project, which has the goal of continu
16
  This is important because we want our models to know about events like COVID or
17
  a presidential election right after they happen.
18
 
19
- ## Intended uses
20
-
21
- You can use the raw model for text generation or fine-tune it to a downstream task.
22
-
23
- ## How to use
24
-
25
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Dataset
28
 
 
16
  This is important because we want our models to know about events like COVID or
17
  a presidential election right after they happen.
18
 
19
+ ## Intended uses & limitations
20
+
21
+ You can use the raw model for masked language modeling, but it's mostly intended to
22
+ be fine-tuned on a downstream task, such as sequence classification, token classification or question answering.
23
+
24
+ ### How to use
25
+
26
+ You can use this model directly with a pipeline for masked language modeling:
27
+
28
+ ```python
29
+ >>> from transformers import pipeline
30
+ >>> unmasker = pipeline('fill-mask', model='olm/olm-roberta-base-oct-2022')
31
+ >>> unmasker("Hello I'm a <mask> model.")
32
+ [{'score': 0.10601336508989334,
33
+ 'token': 2450,
34
+ 'token_str': ' role',
35
+ 'sequence': "Hello I'm a role model."},
36
+ {'score': 0.05792810395359993,
37
+ 'token': 2677,
38
+ 'token_str': ' former',
39
+ 'sequence': "Hello I'm a former model."},
40
+ {'score': 0.057744599878787994,
41
+ 'token': 1968,
42
+ 'token_str': ' professional',
43
+ 'sequence': "Hello I'm a professional model."},
44
+ {'score': 0.029099510982632637,
45
+ 'token': 932,
46
+ 'token_str': ' business',
47
+ 'sequence': "Hello I'm a business model."},
48
+ {'score': 0.024220379069447517,
49
+ 'token': 1840,
50
+ 'token_str': ' young',
51
+ 'sequence': "Hello I'm a young model."}]
52
+ ```
53
+
54
+ Here is how to use this model to get the features of a given text in PyTorch:
55
+
56
+ ```python
57
+ from transformers import AutoTokenizer, RobertaModel
58
+ tokenizer = AutoTokenizer.from_pretrained('olm/olm-roberta-base-oct-2022')
59
+ model = RobertaModel.from_pretrained("olm/olm-roberta-base-oct-2022")
60
+ text = "Replace me by any text you'd like."
61
+ encoded_input = tokenizer(text, return_tensors='pt')
62
+ output = model(**encoded_input)
63
+ ```
64
 
65
  ## Dataset
66