--- license: agpl-3.0 language: - it task_categories: - token-classification datasets: - mrovera/eventnet-ita tags: - Frame Parsing - Event Extraction --- # EventNet-ITA The model is a full-text frame parser for events in Italian and it has been trained on [EventNet-ITA](https://huggingface.co/datasets/mrovera/eventnet-ita). The model can be used for _full-text_ Frame Parsing and Event Extraction. Please refer to the [paper](https://aclanthology.org/2024.latechclfl-1.9) for a more detailed description. ## Model Details ### Model Description In its current version, EventNet-ITA is able to recognize and classifiy 205 semantic frames and their (specific) frame elements. The unit of analysis is the sentence. ### Direct Use Provided with an input sequence of tokens, the model labels each token with the corresponding frame and/or frame element label(s). ``` La B-ENTITY*BEING_LOCATED|B-THEME*CONQUERING cittadina I-ENTITY*BEING_LOCATED|I-THEME*CONQUERING , O posta B-BEING_LOCATED a B-RELATIVE_LOCATION*BEING_LOCATED est I-RELATIVE_LOCATION*BEING_LOCATED del I-RELATIVE_LOCATION*BEING_LOCATED corso I-RELATIVE_LOCATION*BEING_LOCATED d' I-RELATIVE_LOCATION*BEING_LOCATED acqua I-RELATIVE_LOCATION*BEING_LOCATED , O venne O conquistata B-CONQUERING , O ma O il B-EXPLOSIVE*DETONATE_EXPLOSIVE ponte I-EXPLOSIVE*DETONATE_EXPLOSIVE sul I-EXPLOSIVE*DETONATE_EXPLOSIVE fiume I-EXPLOSIVE*DETONATE_EXPLOSIVE era O giĆ  O stato O fatto B-DETONATE_EXPLOSIVE saltare I-DETONATE_EXPLOSIVE regolarmente O dai B-AGENT*DETONATE_EXPLOSIVE genieri I-AGENT*DETONATE_EXPLOSIVE francesi I-AGENT*DETONATE_EXPLOSIVE . O ``` ## Training Details The model has been trained using [MaChAmp](https://github.com/machamp-nlp/machamp), a Python tookit supporting a variety of NLP tasks, by fine-tuning [this Italian BERT pretrained model](https://huggingface.co/dbmdz/bert-base-italian-xxl-cased). Training hyperparameters: - Batch size: 64 - Learning rate: 1.5e-3 All other hyperparameters have been left unchanged w.r.t. the default MaChAmp configuration for the multi-sequential token classification task. ### Training Data Please refer to the [dataset repo](https://huggingface.co/datasets/mrovera/eventnet-ita). ### Model Re-training In order to re-train the model, download the [dataset](https://huggingface.co/datasets/mrovera/eventnet-ita) and follow the instructions for training a [multiseq task](https://github.com/machamp-nlp/machamp/blob/master/docs/multiseq.md) in MaChAmp. ### Inference EventNet-ITA's model can be used for Frame Parsing on new texts. In order to do so, you have to follow a few simple steps. 1. Clone the github repo: `git clone https://github.com/machamp-nlp/machamp.git` 2. Download EventNet-ITA's model from this repo (450 MB) and move it into the `machamp` folder (where is up to you, by default MaChAmp saves trained models in the logs folder) 3. Save the data you want to use for prediction in a two-column tsv file, one word per line, with a placeholder in column 1, each sentence separated by a blank line (without placeholder), like this: ``` This _ is _ the _ first _ sentence _ . _ This _ is _ the _ second _ one _ . _ ``` 4. Follow the instruction for predicting with [MaChAmp](https://github.com/machamp-nlp/machamp) (see section "Prediction") using a fine-tuned model. ## Evaluation The model has been evaluated on three folds, each time with a stratified split of the dataset, with a 80/10/10 train/dev/test ratio. Please see the paper for further details. Hereafter we report the synthetic values obtained by averaging the Precision, Recall and F1-score values of the three splits. **Token-based** (**_relaxed_**) performance: | | P | R | F1 | |----------------------------|--------|---------|---------| |Frames | 0.904 | 0.914 | **0.907** | |Frames (weighted) | 0.909 | 0.919 | 0.913 | |Frame Elements | 0.841 | 0.724 | **0.761** | |Frames Elements (weighted) | 0.850 | 0.779 | 0.804 | **Span-based** (**_strict_**) performance: | | P | R | F1 | |----------------------------|--------|---------|--------| |Frames | 0.906 | 0.899 | **0.901** | |Frames (weighted) | 0.909 | 0.903 | 0.905 | |Frame Elements | 0.829 | 0.666 | **0.724** | |Frames Elements (weighted) | 0.853 | 0.711 | 0.768 | ### Citation Information If you use EventNet-ITA, please cite the following paper: ``` @inproceedings{rovera-2024-eventnet, title = "{E}vent{N}et-{ITA}: {I}talian Frame Parsing for Events", author = "Rovera, Marco", editor = "Bizzoni, Yuri and Degaetano-Ortlieb, Stefania and Kazantseva, Anna and Szpakowicz, Stan", booktitle = "Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)", year = "2024", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.latechclfl-1.9", pages = "77--90", } ```