eevvgg's picture
up readme 4
31635d8
---
language:
- pl
- en
pipeline_tag: text-classification
widget:
- text: TRUMP needs undecided voters
example_title: example 1
- text: Oczywiście ze Pan Prezydent to nasza duma narodowa!!
example_title: example 2
tags:
- text
- sentiment
- politics
- text-classification
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: sentimenTw-political
results:
- task:
type: text-classification
name: Text Classification
dataset:
type: social media
name: politics
metrics:
- type: f1 macro
value: 71.2
- type: accuracy
value: 74
---
# eevvgg/sentimenTw-political
This model is a fine-tuned version of multilingual model [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment).
Classification of text sentiment into 3 categories: negative, neutral, positive.
Fine-tuned on a 2k sample of manually annotated Reddit (EN) and Twitter (PL) data.
- **Developed by:** Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf
- **Model type:** RoBERTa for sentiment classification
- **Language(s) (NLP):** Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets
- **License:** [More Information Needed]
- **Finetuned from model:** [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)
# Uses
Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain.
Model suited for short text (up to 200 tokens) .
## How to Get Started with the Model
```
from transformers import pipeline
model_path = "eevvgg/sentimenTw-political"
sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)
sequence = ["TRUMP needs undecided voters",
"Oczywiście ze Pan Prezydent to nasza duma narodowa!!"]
result = sentiment_task(sequence)
labels = [i['label'] for i in result] # ['neutral', 'positive']
```
## Model Sources
- **Repository:** [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
- **Paper:** TBA
- **BibTex citation:**
```
@misc{SentimenTwGK2023,
author={Gajewska, Ewelina and Konat, Barbara},
title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
year={2023},
howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
```
# Training Details
- Trained for 3 epochs, mini-batch size of 8.
- Training results: loss: 0.515
- See details in [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
### Preprocessing
- Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces.
### Speeds, Sizes, Times
- See [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
# Evaluation
- Evaluation run on a sample of 200 texts (10\% of data).
## Results
- accuracy: 74.0
- macro avg:
- f1: 71.2
- precision: 72.8
- recall: 70.8
- weighted avg:
- f1: 73.3
- precision: 74.0
- recall: 74.0
precision recall f1-score support
negative 0.752 0.901 0.820 91
neutral 0.764 0.592 0.667 71
positive 0.667 0.632 0.649 38
# Citation
**BibTeX:**
```
@misc{SentimenTwGK2023,
author={Gajewska, Ewelina and Konat, Barbara},
title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
year={2023},
howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
```
**APA:**
```
Gajewska, E., & Konat, B. (2023).
SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media.
https://huggingface.co/eevvgg/sentimenTw-political.
```