eevvgg
/

roberta-base-sentiment-politics

+---
+language:
+- pl
+- en
+pipeline_tag: text-classification
+widget:
+- text: TRUMP needs undecided voters
+  example_title: example 1
+- text: Oczywiście ze Pan Prezydent to nasza duma narodowa!!
+  example_title: example 2
+tags:
+- text
+- sentiment
+- politics
+- text-classification
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+model-index:
+- name: sentimenTw-political
+  results:
+  - task:
+      type: text-classification
+      name: Text Classification
+    dataset:
+      type: social media
+      name: politics
+    metrics:
+    - type: f1 macro
+      value: 71.2
+    - type: accuracy
+      value: 74
+---
+- **Developed by:** Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf
+- **Model type:** RoBERTa for sentiment classification
+- **Language(s) (NLP):** Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets
+- **License:** [More Information Needed]
+- **Finetuned from model:** [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)
+## Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
+- **Paper:** TBA
+- **BibTex citation:**
+```
+@misc{SentimenTwGK2023,
+  author={Gajewska, Ewelina and Konat, Barbara},
+  title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
+  year={2023},
+  howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
+}
+```
+# Uses
+Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain.
+Model suited for short text (up to 200 tokens) .
+## How to Get Started with the Model
+```
+from transformers import pipeline
+model_path = "eevvgg/sentimenTw-political"
+sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)
+sequence = ["TRUMP needs undecided voters",
+            "Oczywiście ze Pan Prezydent to nasza duma narodowa!!"]
+result = sentiment_task(sequence)
+labels = [i['label'] for i in result] # ['neutral', 'positive']
+```
+# Training Details
+## Training Procedure [optional]
+- Trained for 3 epochs, mini-batch size of 8.
+- Training results: loss: 0.515
+- See detail in [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
+### Preprocessing
+- Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces.
+-
+### Speeds, Sizes, Times
+- See [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
+# Evaluation
+## Testing Data, Factors & Metrics
+### Testing Data
+- A sample of 200 text (10\% of data)
+## Results
+- accuracy: 74.0
+- macro avg:
+  - f1: 71.2
+  - precision: 72.8
+  - recall: 70.8
+- weighted avg:
+  - f1: 73.3
+  - precision: 74.0
+  - recall: 74.0
+              precision    recall  f1-score   support
+           0      0.752     0.901     0.820        91
+           1      0.764     0.592     0.667        71
+           2      0.667     0.632     0.649        38
+### Summary
+# Citation
+**BibTeX:**
+```
+@misc{SentimenTwGK2023,
+  author={Gajewska, Ewelina and Konat, Barbara},
+  title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
+  year={2023},
+  howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
+}
+```
+**APA:**
+```
+Gajewska, E., & Konat, B. (2023). SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media. https://huggingface.co/eevvgg/sentimenTw-political.
+```