eevvgg commited on
Commit
6439e49
1 Parent(s): 2eb7563

create readme

Browse files
Files changed (1) hide show
  1. README.md +152 -0
README.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - pl
4
+ - en
5
+ pipeline_tag: text-classification
6
+ widget:
7
+ - text: TRUMP needs undecided voters
8
+ example_title: example 1
9
+ - text: Oczywiście ze Pan Prezydent to nasza duma narodowa!!
10
+ example_title: example 2
11
+ tags:
12
+ - text
13
+ - sentiment
14
+ - politics
15
+ - text-classification
16
+ metrics:
17
+ - accuracy
18
+ - f1
19
+ - precision
20
+ - recall
21
+ model-index:
22
+ - name: sentimenTw-political
23
+ results:
24
+ - task:
25
+ type: text-classification
26
+ name: Text Classification
27
+ dataset:
28
+ type: social media
29
+ name: politics
30
+ metrics:
31
+ - type: f1 macro
32
+ value: 71.2
33
+ - type: accuracy
34
+ value: 74
35
+ ---
36
+
37
+ - **Developed by:** Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf
38
+
39
+ - **Model type:** RoBERTa for sentiment classification
40
+ - **Language(s) (NLP):** Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets
41
+ - **License:** [More Information Needed]
42
+ - **Finetuned from model:** [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)
43
+
44
+ ## Model Sources [optional]
45
+
46
+ <!-- Provide the basic links for the model. -->
47
+
48
+ - **Repository:** [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
49
+ - **Paper:** TBA
50
+ - **BibTex citation:**
51
+ ```
52
+ @misc{SentimenTwGK2023,
53
+ author={Gajewska, Ewelina and Konat, Barbara},
54
+ title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
55
+ year={2023},
56
+ howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
57
+ }
58
+ ```
59
+
60
+ # Uses
61
+
62
+ Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain.
63
+ Model suited for short text (up to 200 tokens) .
64
+
65
+
66
+ ## How to Get Started with the Model
67
+
68
+ ```
69
+ from transformers import pipeline
70
+
71
+ model_path = "eevvgg/sentimenTw-political"
72
+ sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)
73
+
74
+ sequence = ["TRUMP needs undecided voters",
75
+ "Oczywiście ze Pan Prezydent to nasza duma narodowa!!"]
76
+
77
+ result = sentiment_task(sequence)
78
+ labels = [i['label'] for i in result] # ['neutral', 'positive']
79
+
80
+ ```
81
+
82
+ # Training Details
83
+
84
+
85
+ ## Training Procedure [optional]
86
+
87
+ - Trained for 3 epochs, mini-batch size of 8.
88
+ - Training results: loss: 0.515
89
+ - See detail in [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
90
+
91
+ ### Preprocessing
92
+
93
+ - Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces.
94
+ -
95
+
96
+ ### Speeds, Sizes, Times
97
+
98
+ - See [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
99
+
100
+
101
+ # Evaluation
102
+
103
+
104
+ ## Testing Data, Factors & Metrics
105
+
106
+ ### Testing Data
107
+
108
+ - A sample of 200 text (10\% of data)
109
+
110
+ ## Results
111
+
112
+ - accuracy: 74.0
113
+ - macro avg:
114
+ - f1: 71.2
115
+ - precision: 72.8
116
+ - recall: 70.8
117
+ - weighted avg:
118
+ - f1: 73.3
119
+ - precision: 74.0
120
+ - recall: 74.0
121
+
122
+
123
+ precision recall f1-score support
124
+
125
+ 0 0.752 0.901 0.820 91
126
+ 1 0.764 0.592 0.667 71
127
+ 2 0.667 0.632 0.649 38
128
+
129
+
130
+
131
+ ### Summary
132
+
133
+
134
+ # Citation
135
+
136
+ **BibTeX:**
137
+
138
+ ```
139
+ @misc{SentimenTwGK2023,
140
+ author={Gajewska, Ewelina and Konat, Barbara},
141
+ title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
142
+ year={2023},
143
+ howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
144
+ }
145
+ ```
146
+
147
+ **APA:**
148
+
149
+ ```
150
+ Gajewska, E., & Konat, B. (2023). SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media. https://huggingface.co/eevvgg/sentimenTw-political.
151
+
152
+ ```