File size: 4,139 Bytes
6439e49 e5a3547 6439e49 fc789e7 6439e49 fc789e7 6439e49 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
---
language:
- pl
- en
pipeline_tag: text-classification
widget:
- text: TRUMP needs undecided voters
example_title: example 1
- text: Oczywiście ze Pan Prezydent to nasza duma narodowa!!
example_title: example 2
tags:
- text
- sentiment
- politics
- text-classification
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: sentimenTw-political
results:
- task:
type: text-classification
name: Text Classification
dataset:
type: social media
name: politics
metrics:
- type: f1 macro
value: 71.2
- type: accuracy
value: 74
---
# eevvgg/sentimenTw-political
This model is a fine-tuned version of multilingual model [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment).
Classification of text sentiment into 3 categories: negative, neutral, positive.
Fine-tuned on a 2k sample of manually annotated Reddit (EN) and Twitter (PL) data.
- **Developed by:** Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf
- **Model type:** RoBERTa for sentiment classification
- **Language(s) (NLP):** Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets
- **License:** [More Information Needed]
- **Finetuned from model:** [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)
## Model Sources
- **Repository:** [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
- **Paper:** TBA
- **BibTex citation:**
```
@misc{SentimenTwGK2023,
author={Gajewska, Ewelina and Konat, Barbara},
title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
year={2023},
howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
```
# Uses
Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain.
Model suited for short text (up to 200 tokens) .
## How to Get Started with the Model
```
from transformers import pipeline
model_path = "eevvgg/sentimenTw-political"
sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)
sequence = ["TRUMP needs undecided voters",
"Oczywiście ze Pan Prezydent to nasza duma narodowa!!"]
result = sentiment_task(sequence)
labels = [i['label'] for i in result] # ['neutral', 'positive']
```
# Training Details
## Training Procedure [optional]
- Trained for 3 epochs, mini-batch size of 8.
- Training results: loss: 0.515
- See detail in [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
### Preprocessing
- Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces.
-
### Speeds, Sizes, Times
- See [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
# Evaluation
## Testing Data, Factors & Metrics
### Testing Data
- A sample of 200 text (10\% of data)
## Results
- accuracy: 74.0
- macro avg:
- f1: 71.2
- precision: 72.8
- recall: 70.8
- weighted avg:
- f1: 73.3
- precision: 74.0
- recall: 74.0
precision recall f1-score support
0 0.752 0.901 0.820 91
1 0.764 0.592 0.667 71
2 0.667 0.632 0.649 38
# Citation
**BibTeX:**
```
@misc{SentimenTwGK2023,
author={Gajewska, Ewelina and Konat, Barbara},
title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
year={2023},
howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
```
**APA:**
```
Gajewska, E., & Konat, B. (2023).
SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media.
https://huggingface.co/eevvgg/sentimenTw-political.
``` |