File size: 4,097 Bytes
6439e49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e5a3547
 
 
 
 
 
 
6439e49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31635d8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6439e49
 
 
 
34742ed
6439e49
 
 
 
 
 
 
 
 
 
 
34742ed
6439e49
 
 
 
 
 
 
 
 
 
 
 
 
 
db4871b
6439e49
34742ed
db4871b
34742ed
6439e49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fc789e7
 
 
6439e49
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
language:
- pl
- en
pipeline_tag: text-classification
widget:
- text: TRUMP needs undecided voters
  example_title: example 1
- text: Oczywiście ze Pan Prezydent to nasza duma narodowa!!
  example_title: example 2
tags:
- text
- sentiment
- politics
- text-classification
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: sentimenTw-political
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      type: social media
      name: politics
    metrics:
    - type: f1 macro
      value: 71.2
    - type: accuracy
      value: 74
---

# eevvgg/sentimenTw-political

This model is a fine-tuned version of multilingual model [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment). 
Classification of text sentiment into 3 categories: negative, neutral, positive.
Fine-tuned on a 2k sample of manually annotated Reddit (EN) and Twitter (PL) data.


- **Developed by:** Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf

- **Model type:** RoBERTa for sentiment classification
- **Language(s) (NLP):** Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets 
- **License:** [More Information Needed]
- **Finetuned from model:** [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)


# Uses

Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain.
Model suited for short text (up to 200 tokens) .


## How to Get Started with the Model

```
from transformers import pipeline

model_path = "eevvgg/sentimenTw-political"
sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)

sequence = ["TRUMP needs undecided voters",
            "Oczywiście ze Pan Prezydent to nasza duma narodowa!!"]
            
result = sentiment_task(sequence)
labels = [i['label'] for i in result] # ['neutral', 'positive']            

```


## Model Sources 

- **Repository:** [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
- **Paper:** TBA
- **BibTex citation:** 
```
@misc{SentimenTwGK2023,
  author={Gajewska, Ewelina and Konat, Barbara},
  title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
  year={2023},
  howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
```

# Training Details

- Trained for 3 epochs, mini-batch size of 8.
- Training results: loss: 0.515
- See details in [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)

### Preprocessing

- Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces.

### Speeds, Sizes, Times

- See [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)

# Evaluation

- Evaluation run on a sample of 200 texts (10\% of data).

## Results

- accuracy: 74.0
- macro avg:
  - f1: 71.2
  - precision: 72.8
  - recall: 70.8
- weighted avg:
  - f1: 73.3
  - precision: 74.0
  - recall: 74.0


                       precision    recall  f1-score   support

           negative      0.752     0.901     0.820        91
           neutral       0.764     0.592     0.667        71
           positive      0.667     0.632     0.649        38



# Citation 

**BibTeX:**

```
@misc{SentimenTwGK2023,
  author={Gajewska, Ewelina and Konat, Barbara},
  title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
  year={2023},
  howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
```

**APA:**

```
Gajewska, E., & Konat, B. (2023).
SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media.
https://huggingface.co/eevvgg/sentimenTw-political.

```