dlicari commited on
Commit
8b71f51
1 Parent(s): 52acc9c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: it
3
+ license: apache-2.0
4
+ widget:
5
+ - text: "Il [MASK] ha chiesto revocarsi l'obbligo di pagamento"
6
+ ---
7
+
8
+ # ITALIAN-LEGAL-BERT-SC
9
+ It is the [ITALIAN-LEGAL-BERT](https://huggingface.co/dlicari/Italian-Legal-BERT) variant pre-trained from scratch on Italian legal documents (ITA-LEGAL-BERT-SC) based on the CamemBERT architecture
10
+
11
+ ## Training procedure
12
+ It was trained from scratch using a larger training dataset, 6.6GB of civil and criminal cases.
13
+ We used [CamemBERT](https://huggingface.co/docs/transformers/main/en/model_doc/camembert) architecture with a language modeling head on top, AdamW Optimizer, initial learning rate 2e-5 (with linear learning rate decay), sequence length 512, batch size 18, 1 million training steps,
14
+ device 8*NVIDIA A100 40GB using distributed data parallel (each step performs 8 batches). It uses SentencePiece tokenization trained from scratch on a subset of training set (5 milions sentences)
15
+ and vocabulary size of 32000
16
+
17
+