up readme 4

31635d8 over 1 year ago

No virus

4.1 kB

	---
	language:
	- pl
	- en
	pipeline_tag: text-classification
	widget:
	- text: TRUMP needs undecided voters
	example_title: example 1
	- text: Oczywiście ze Pan Prezydent to nasza duma narodowa!!
	example_title: example 2
	tags:
	- text
	- sentiment
	- politics
	- text-classification
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	model-index:
	- name: sentimenTw-political
	results:
	- task:
	type: text-classification
	name: Text Classification
	dataset:
	type: social media
	name: politics
	metrics:
	- type: f1 macro
	value: 71.2
	- type: accuracy
	value: 74
	---

	# eevvgg/sentimenTw-political

	This model is a fine-tuned version of multilingual model [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment).
	Classification of text sentiment into 3 categories: negative, neutral, positive.
	Fine-tuned on a 2k sample of manually annotated Reddit (EN) and Twitter (PL) data.


	- Developed by: Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf

	- Model type: RoBERTa for sentiment classification
	- Language(s) (NLP): Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets
	- License: [More Information Needed]
	- Finetuned from model: [cardiffnlp/twitter-xlm-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)


	# Uses

	Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain.
	Model suited for short text (up to 200 tokens) .


	## How to Get Started with the Model

	```
	from transformers import pipeline

	model_path = "eevvgg/sentimenTw-political"
	sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)

	sequence = ["TRUMP needs undecided voters",
	"Oczywiście ze Pan Prezydent to nasza duma narodowa!!"]

	result = sentiment_task(sequence)
	labels = [i['label'] for i in result] # ['neutral', 'positive']

	```


	## Model Sources

	- Repository: [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)
	- Paper: TBA
	- BibTex citation:
	```
	@misc{SentimenTwGK2023,
	author={Gajewska, Ewelina and Konat, Barbara},
	title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
	year={2023},
	howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
	}
	```

	# Training Details

	- Trained for 3 epochs, mini-batch size of 8.
	- Training results: loss: 0.515
	- See details in [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)

	### Preprocessing

	- Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces.

	### Speeds, Sizes, Times

	- See [Colab notebook](https://colab.research.google.com/drive/1Rqgjp2tlReZ-hOZz63jw9cIwcZmcL9lR?usp=sharing)

	# Evaluation

	- Evaluation run on a sample of 200 texts (10\% of data).

	## Results

	- accuracy: 74.0
	- macro avg:
	- f1: 71.2
	- precision: 72.8
	- recall: 70.8
	- weighted avg:
	- f1: 73.3
	- precision: 74.0
	- recall: 74.0


	precision recall f1-score support

	negative 0.752 0.901 0.820 91
	neutral 0.764 0.592 0.667 71
	positive 0.667 0.632 0.649 38



	# Citation

	BibTeX:

	```
	@misc{SentimenTwGK2023,
	author={Gajewska, Ewelina and Konat, Barbara},
	title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
	year={2023},
	howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
	}
	```

	APA:

	```
	Gajewska, E., & Konat, B. (2023).
	SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media.
	https://huggingface.co/eevvgg/sentimenTw-political.

	```