--- language: - pl - en pipeline_tag: text-classification widget: - text: TRUMP needs undecided voters example_title: example 1 - text: Oczywiście ze Pan Prezydent to nasza duma narodowa!! example_title: example 2 tags: - text - sentiment - politics - text-classification metrics: - accuracy - f1 - precision - recall model-index: - name: sentimenTw-political results: - task: type: text-classification name: Text Classification dataset: type: social media name: politics metrics: - type: f1 macro value: 71.2 - type: accuracy value: 74 --- # eevvgg/sentimenTw-political This model is a fine-tuned version of multilingual model [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment). Classification of text sentiment into 3 categories: negative, neutral, positive. Fine-tuned on a 2k sample of manually annotated Reddit (EN) and Twitter (PL) data. - **Developed by:** Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf - **Model type:** RoBERTa for sentiment classification - **Language(s) (NLP):** Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets - **License:** [More Information Needed] - **Finetuned from model:** [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment) # Uses Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain. Model suited for short text (up to 200 tokens) . ## How to Get Started with the Model ``` from transformers import pipeline model_path = "eevvgg/sentimenTw-political" sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path) sequence = ["TRUMP needs undecided voters", "Oczywiście ze Pan Prezydent to nasza duma narodowa!!"] result = sentiment_task(sequence) labels = [i['label'] for i in result] # ['neutral', 'positive'] ``` ## Model Sources - **Repository:** [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing) - **Paper:** TBA - **BibTex citation:** ``` @misc{SentimenTwGK2023, author={Gajewska, Ewelina and Konat, Barbara}, title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media}, year={2023}, howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}}, } ``` # Training Details - Trained for 3 epochs, mini-batch size of 8. - Training results: loss: 0.515 - See details in [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing) ### Preprocessing - Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces. ### Speeds, Sizes, Times - See [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing) # Evaluation - Evaluation run on a sample of 200 texts (10\% of data). ## Results - accuracy: 74.0 - macro avg: - f1: 71.2 - precision: 72.8 - recall: 70.8 - weighted avg: - f1: 73.3 - precision: 74.0 - recall: 74.0 precision recall f1-score support negative 0.752 0.901 0.820 91 neutral 0.764 0.592 0.667 71 positive 0.667 0.632 0.649 38 # Citation **BibTeX:** ``` @misc{SentimenTwGK2023, author={Gajewska, Ewelina and Konat, Barbara}, title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media}, year={2023}, howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}}, } ``` **APA:** ``` Gajewska, E., & Konat, B. (2023). SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media. https://huggingface.co/eevvgg/sentimenTw-political. ```