---
language:
- pl
- en
pipeline_tag: text-classification
widget:
- text: TRUMP needs undecided voters
  example_title: example 1
- text: Oczywiście ze Pan Prezydent to nasza duma narodowa!!
  example_title: example 2
tags:
- text
- sentiment
- politics
- text-classification
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: sentimenTw-political
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      type: social media
      name: politics
    metrics:
    - type: f1 macro
      value: 71.2
    - type: accuracy
      value: 74
---

# eevvgg/sentimenTw-political

This model is a fine-tuned version of multilingual model [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment). 
Classification of text sentiment into 3 categories: negative, neutral, positive.
Fine-tuned on a 2k sample of manually annotated Reddit (EN) and Twitter (PL) data.


- **Developed by:** Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf

- **Model type:** RoBERTa for sentiment classification
- **Language(s) (NLP):** Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets 
- **License:** [More Information Needed]
- **Finetuned from model:** [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)


# Uses

Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain.
Model suited for short text (up to 200 tokens) .


## How to Get Started with the Model

```
from transformers import pipeline

model_path = "eevvgg/sentimenTw-political"
sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)

sequence = ["TRUMP needs undecided voters",
            "Oczywiście ze Pan Prezydent to nasza duma narodowa!!"]
            
result = sentiment_task(sequence)
labels = [i['label'] for i in result] # ['neutral', 'positive']            

```


## Model Sources 

- **Repository:** [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
- **Paper:** TBA
- **BibTex citation:** 
```
@misc{SentimenTwGK2023,
  author={Gajewska, Ewelina and Konat, Barbara},
  title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
  year={2023},
  howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
```

# Training Details

- Trained for 3 epochs, mini-batch size of 8.
- Training results: loss: 0.515
- See details in [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)

### Preprocessing

- Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces.

### Speeds, Sizes, Times

- See [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)

# Evaluation

- Evaluation run on a sample of 200 texts (10\% of data).

## Results

- accuracy: 74.0
- macro avg:
  - f1: 71.2
  - precision: 72.8
  - recall: 70.8
- weighted avg:
  - f1: 73.3
  - precision: 74.0
  - recall: 74.0


                       precision    recall  f1-score   support

           negative      0.752     0.901     0.820        91
           neutral       0.764     0.592     0.667        71
           positive      0.667     0.632     0.649        38


# Citation 

**BibTeX:**

```
@misc{SentimenTwGK2023,
  author={Gajewska, Ewelina and Konat, Barbara},
  title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
  year={2023},
  howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
```

**APA:**

```
Gajewska, E., & Konat, B. (2023).
SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media.
https://huggingface.co/eevvgg/sentimenTw-political.

```