File size: 4,097 Bytes
6439e49 e5a3547 6439e49 31635d8 6439e49 34742ed 6439e49 34742ed 6439e49 db4871b 6439e49 34742ed db4871b 34742ed 6439e49 fc789e7 6439e49 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
---
language:
- pl
- en
pipeline_tag: text-classification
widget:
- text: TRUMP needs undecided voters
example_title: example 1
- text: Oczywiście ze Pan Prezydent to nasza duma narodowa!!
example_title: example 2
tags:
- text
- sentiment
- politics
- text-classification
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: sentimenTw-political
results:
- task:
type: text-classification
name: Text Classification
dataset:
type: social media
name: politics
metrics:
- type: f1 macro
value: 71.2
- type: accuracy
value: 74
---
# eevvgg/sentimenTw-political
This model is a fine-tuned version of multilingual model [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment).
Classification of text sentiment into 3 categories: negative, neutral, positive.
Fine-tuned on a 2k sample of manually annotated Reddit (EN) and Twitter (PL) data.
- **Developed by:** Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf
- **Model type:** RoBERTa for sentiment classification
- **Language(s) (NLP):** Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets
- **License:** [More Information Needed]
- **Finetuned from model:** [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)
# Uses
Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain.
Model suited for short text (up to 200 tokens) .
## How to Get Started with the Model
```
from transformers import pipeline
model_path = "eevvgg/sentimenTw-political"
sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)
sequence = ["TRUMP needs undecided voters",
"Oczywiście ze Pan Prezydent to nasza duma narodowa!!"]
result = sentiment_task(sequence)
labels = [i['label'] for i in result] # ['neutral', 'positive']
```
## Model Sources
- **Repository:** [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
- **Paper:** TBA
- **BibTex citation:**
```
@misc{SentimenTwGK2023,
author={Gajewska, Ewelina and Konat, Barbara},
title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
year={2023},
howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
```
# Training Details
- Trained for 3 epochs, mini-batch size of 8.
- Training results: loss: 0.515
- See details in [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
### Preprocessing
- Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces.
### Speeds, Sizes, Times
- See [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
# Evaluation
- Evaluation run on a sample of 200 texts (10\% of data).
## Results
- accuracy: 74.0
- macro avg:
- f1: 71.2
- precision: 72.8
- recall: 70.8
- weighted avg:
- f1: 73.3
- precision: 74.0
- recall: 74.0
precision recall f1-score support
negative 0.752 0.901 0.820 91
neutral 0.764 0.592 0.667 71
positive 0.667 0.632 0.649 38
# Citation
**BibTeX:**
```
@misc{SentimenTwGK2023,
author={Gajewska, Ewelina and Konat, Barbara},
title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
year={2023},
howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
```
**APA:**
```
Gajewska, E., & Konat, B. (2023).
SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media.
https://huggingface.co/eevvgg/sentimenTw-political.
``` |