metadata

language:
  - pl
  - en
pipeline_tag: text-classification
widget:
  - text: TRUMP needs undecided voters
    example_title: example 1
  - text: Oczywiście ze Pan Prezydent to nasza duma narodowa!!
    example_title: example 2
tags:
  - text
  - sentiment
  - politics
  - text-classification
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: sentimenTw-political
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          type: social media
          name: politics
        metrics:
          - type: f1 macro
            value: 71.2
          - type: accuracy
            value: 74

eevvgg/sentimenTw-political

This model is a fine-tuned version of multilingual model cardiffnlp/twitter-xlm-roberta-base-sentiment. Classification of text sentiment into 3 categories: negative, neutral, positive. Fine-tuned on a 2k sample of manually annotated Reddit (EN) and Twitter (PL) data.

Developed by: Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf
Model type: RoBERTa for sentiment classification
Language(s) (NLP): Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets
License: [More Information Needed]
Finetuned from model: cardiffnlp/twitter-xlm-roberta-base-sentiment

Model Sources

Repository: Colab notebook
Paper: TBA
BibTex citation:

@misc{SentimenTwGK2023,
  author={Gajewska, Ewelina and Konat, Barbara},
  title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
  year={2023},
  howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}

Uses

Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain. Model suited for short text (up to 200 tokens) .

How to Get Started with the Model

from transformers import pipeline

model_path = "eevvgg/sentimenTw-political"
sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)

sequence = ["TRUMP needs undecided voters",
            "Oczywiście ze Pan Prezydent to nasza duma narodowa!!"]
            
result = sentiment_task(sequence)
labels = [i['label'] for i in result] # ['neutral', 'positive']

Training Details

Training Procedure [optional]

Trained for 3 epochs, mini-batch size of 8.
Training results: loss: 0.515
See detail in Colab notebook

Preprocessing

Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces.

Speeds, Sizes, Times

See Colab notebook

Evaluation

Testing Data, Factors & Metrics

Testing Data

A sample of 200 text (10% of data)

Results

accuracy: 74.0
macro avg:
- f1: 71.2
- precision: 72.8
- recall: 70.8

weighted avg:

f1: 73.3
precision: 74.0

recall: 74.0

      precision    recall  f1-score   support

   0      0.752     0.901     0.820        91
   1      0.764     0.592     0.667        71
   2      0.667     0.632     0.649        38

Citation

BibTeX:

@misc{SentimenTwGK2023,
  author={Gajewska, Ewelina and Konat, Barbara},
  title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
  year={2023},
  howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}

APA:

Gajewska, E., & Konat, B. (2023).
SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media.
https://huggingface.co/eevvgg/sentimenTw-political.