--- license: apache-2.0 datasets: - Alienmaster/SB10k - cardiffnlp/tweet_sentiment_multilingual - legacy-datasets/wikipedia - community-datasets/gnad10 language: - de base_model: dbmdz/bert-base-german-uncased pipeline_tag: text-classification --- ## Tweet Style Classifier (German) This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether a German text is a tweet or not. The dataset contained about 20K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing. The NVIDIA RTX A6000 GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer. The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets. ### How to use ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline model_name = "rabuahmad/tweet-style-classifier-de" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512) classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512) text = "Gestern war ein schöner Tag!" result = classifier(text) ``` Label 1 indicates that the text is predicted to be a tweet. ### Evaluation Evaluation results on the test set: | Metric |Score | |----------|-----------| | Accuracy | 0.99988 | | Precision| 0.99901 | | Recall | 0.99901 | | F1 | 0.99901 |