eevvgg's picture
up readme 2
fc789e7
|
raw
history blame
No virus
4.14 kB
metadata
language:
  - pl
  - en
pipeline_tag: text-classification
widget:
  - text: TRUMP needs undecided voters
    example_title: example 1
  - text: Oczywiście ze Pan Prezydent to nasza duma narodowa!!
    example_title: example 2
tags:
  - text
  - sentiment
  - politics
  - text-classification
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: sentimenTw-political
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          type: social media
          name: politics
        metrics:
          - type: f1 macro
            value: 71.2
          - type: accuracy
            value: 74

eevvgg/sentimenTw-political

This model is a fine-tuned version of multilingual model cardiffnlp/twitter-xlm-roberta-base-sentiment. Classification of text sentiment into 3 categories: negative, neutral, positive. Fine-tuned on a 2k sample of manually annotated Reddit (EN) and Twitter (PL) data.

Model Sources

@misc{SentimenTwGK2023,
  author={Gajewska, Ewelina and Konat, Barbara},
  title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
  year={2023},
  howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}

Uses

Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain. Model suited for short text (up to 200 tokens) .

How to Get Started with the Model

from transformers import pipeline

model_path = "eevvgg/sentimenTw-political"
sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)

sequence = ["TRUMP needs undecided voters",
            "Oczywiście ze Pan Prezydent to nasza duma narodowa!!"]
            
result = sentiment_task(sequence)
labels = [i['label'] for i in result] # ['neutral', 'positive']            

Training Details

Training Procedure [optional]

  • Trained for 3 epochs, mini-batch size of 8.
  • Training results: loss: 0.515
  • See detail in Colab notebook

Preprocessing

  • Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces.

Speeds, Sizes, Times

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • A sample of 200 text (10% of data)

Results

  • accuracy: 74.0
  • macro avg:
    • f1: 71.2
    • precision: 72.8
    • recall: 70.8
  • weighted avg:
    • f1: 73.3

    • precision: 74.0

    • recall: 74.0

            precision    recall  f1-score   support
      
         0      0.752     0.901     0.820        91
         1      0.764     0.592     0.667        71
         2      0.667     0.632     0.649        38
      

Citation

BibTeX:

@misc{SentimenTwGK2023,
  author={Gajewska, Ewelina and Konat, Barbara},
  title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
  year={2023},
  howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}

APA:

Gajewska, E., & Konat, B. (2023).
SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media.
https://huggingface.co/eevvgg/sentimenTw-political.