Francesco-A commited on
Commit
9a6def1
1 Parent(s): 1d4b615

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -14
README.md CHANGED
@@ -8,28 +8,47 @@ datasets:
8
  model-index:
9
  - name: bert-finetuned-squad-v1
10
  results: []
 
 
 
 
 
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
 
16
- # bert-finetuned-squad-v1
 
17
 
18
- This model is a fine-tuned version of [bert-base-cased](https://huggingface.co/bert-base-cased) on the squad dataset.
 
 
 
19
 
20
- ## Model description
 
 
 
 
21
 
22
- More information needed
 
 
 
23
 
24
- ## Intended uses & limitations
 
 
25
 
26
- More information needed
27
 
28
- ## Training and evaluation data
29
 
30
- More information needed
 
 
31
 
32
- ## Training procedure
33
 
34
  ### Training hyperparameters
35
 
@@ -42,13 +61,15 @@ The following hyperparameters were used during training:
42
  - lr_scheduler_type: linear
43
  - num_epochs: 3
44
 
45
- ### Training results
46
-
47
 
 
 
 
48
 
49
  ### Framework versions
50
 
51
  - Transformers 4.32.1
52
  - Pytorch 2.0.1+cu118
53
  - Datasets 2.14.4
54
- - Tokenizers 0.13.3
 
8
  model-index:
9
  - name: bert-finetuned-squad-v1
10
  results: []
11
+ language:
12
+ - en
13
+ metrics:
14
+ - f1
15
+ - exact_match
16
  ---
17
 
18
+ ## bert-finetuned-squad-v1
19
+ This model is a fine-tuned version of bert-base-cased on the SQuAD (Stanford Question Answering Dataset) dataset.
20
 
21
+ ## Model description:
22
+ The bert-finetuned-squad-v1 model is built upon the BERT (Bidirectional Encoder Representations from Transformers) architecture and has been fine-tuned specifically for the task of question-answering on the SQuAD dataset. It takes a passage of text (context) and a question as input and predicts the start and end positions of the answer within the context.
23
 
24
+ ## Intended uses & limitations:
25
+ ### Intended Uses:
26
+ - This model is designed for question-answering tasks where a given context and question need to be answered with a span of text from the context.
27
+ - It can be used in applications such as chatbots, search engines, and any scenario where questions are answered based on a given passage.
28
 
29
+ ### Limitations:
30
+ - The model's performance may vary depending on the complexity and length of the questions and contexts.
31
+ - It may not perform well on questions requiring common-sense reasoning or world knowledge beyond the training data.
32
+ - The model's output is limited to a single span within the context, which may not cover multi-sentence or complex answers.
33
+ - It is not suitable for tasks that involve generating lengthy or abstractive answers.
34
 
35
+ ## Training and evaluation data:
36
+ The model was trained on the SQuAD dataset, which consists of two main splits:
37
+ - Training Set: It comprises 87,599 examples, each consisting of a context, a question, and the corresponding answer span(s).
38
+ - Validation Set: It consists of 10,570 examples, similar in structure to the training set, used for model evaluation during training.
39
 
40
+ ## Training procedure:
41
+ The training process involved several key steps:
42
+ 1. Preprocessing: The training data was preprocessed to convert text inputs into numerical IDs using a BERT tokenizer. Additionally, labels for start and end positions of answer spans were generated.
43
 
44
+ 2. Sliding Window: To handle long contexts, a sliding window approach was employed. Long contexts were split into multiple input features with overlapping tokens.
45
 
46
+ 3. Fine-tuning: The model was fine-tuned on the SQuAD training data, with a focus on minimizing the loss associated with predicting answer spans.
47
 
48
+ 4. Post-processing: During inference, the model predicts start and end logits for answer spans, and these logits are used to determine the answer span with the highest score. The predictions are then converted back into text spans based on token offsets.
49
+
50
+ 5. Evaluation: The model's performance was evaluated on the SQuAD validation set using metrics such as exact match (EM) and F1 score, which measure the accuracy of the predicted answers.
51
 
 
52
 
53
  ### Training hyperparameters
54
 
 
61
  - lr_scheduler_type: linear
62
  - num_epochs: 3
63
 
64
+ ### Validation results
 
65
 
66
+ - Number of Validation Examples: 10,570
67
+ - Exact Match (EM): 80.86%
68
+ - F1 Score: 88.28%
69
 
70
  ### Framework versions
71
 
72
  - Transformers 4.32.1
73
  - Pytorch 2.0.1+cu118
74
  - Datasets 2.14.4
75
+ - Tokenizers 0.13.3