|
--- |
|
language: |
|
- pl |
|
- en |
|
pipeline_tag: text-classification |
|
widget: |
|
- text: TRUMP needs undecided voters |
|
example_title: example 1 |
|
- text: Oczywiście ze Pan Prezydent to nasza duma narodowa!! |
|
example_title: example 2 |
|
tags: |
|
- text |
|
- sentiment |
|
- politics |
|
- text-classification |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- precision |
|
- recall |
|
model-index: |
|
- name: sentimenTw-political |
|
results: |
|
- task: |
|
type: text-classification |
|
name: Text Classification |
|
dataset: |
|
type: social media |
|
name: politics |
|
metrics: |
|
- type: f1 macro |
|
value: 71.2 |
|
- type: accuracy |
|
value: 74 |
|
--- |
|
|
|
# eevvgg/sentimenTw-political |
|
|
|
This model is a fine-tuned version of multilingual model [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment). |
|
Classification of text sentiment into 3 categories: negative, neutral, positive. |
|
Fine-tuned on a 2k sample of manually annotated Reddit (EN) and Twitter (PL) data. |
|
|
|
|
|
- **Developed by:** Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf |
|
|
|
- **Model type:** RoBERTa for sentiment classification |
|
- **Language(s) (NLP):** Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets |
|
- **License:** [More Information Needed] |
|
- **Finetuned from model:** [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment) |
|
|
|
|
|
# Uses |
|
|
|
Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain. |
|
Model suited for short text (up to 200 tokens) . |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
``` |
|
from transformers import pipeline |
|
|
|
model_path = "eevvgg/sentimenTw-political" |
|
sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path) |
|
|
|
sequence = ["TRUMP needs undecided voters", |
|
"Oczywiście ze Pan Prezydent to nasza duma narodowa!!"] |
|
|
|
result = sentiment_task(sequence) |
|
labels = [i['label'] for i in result] # ['neutral', 'positive'] |
|
|
|
``` |
|
|
|
|
|
## Model Sources |
|
|
|
- **Repository:** [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing) |
|
- **Paper:** TBA |
|
- **BibTex citation:** |
|
``` |
|
@misc{SentimenTwGK2023, |
|
author={Gajewska, Ewelina and Konat, Barbara}, |
|
title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media}, |
|
year={2023}, |
|
howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}}, |
|
} |
|
``` |
|
|
|
# Training Details |
|
|
|
- Trained for 3 epochs, mini-batch size of 8. |
|
- Training results: loss: 0.515 |
|
- See details in [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing) |
|
|
|
### Preprocessing |
|
|
|
- Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces. |
|
|
|
### Speeds, Sizes, Times |
|
|
|
- See [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing) |
|
|
|
# Evaluation |
|
|
|
- Evaluation run on a sample of 200 texts (10\% of data). |
|
|
|
## Results |
|
|
|
- accuracy: 74.0 |
|
- macro avg: |
|
- f1: 71.2 |
|
- precision: 72.8 |
|
- recall: 70.8 |
|
- weighted avg: |
|
- f1: 73.3 |
|
- precision: 74.0 |
|
- recall: 74.0 |
|
|
|
|
|
precision recall f1-score support |
|
|
|
negative 0.752 0.901 0.820 91 |
|
neutral 0.764 0.592 0.667 71 |
|
positive 0.667 0.632 0.649 38 |
|
|
|
|
|
|
|
# Citation |
|
|
|
**BibTeX:** |
|
|
|
``` |
|
@misc{SentimenTwGK2023, |
|
author={Gajewska, Ewelina and Konat, Barbara}, |
|
title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media}, |
|
year={2023}, |
|
howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}}, |
|
} |
|
``` |
|
|
|
**APA:** |
|
|
|
``` |
|
Gajewska, E., & Konat, B. (2023). |
|
SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media. |
|
https://huggingface.co/eevvgg/sentimenTw-political. |
|
|
|
``` |