Edit model card

GPT2 Instruction Tuned English To German Headline Translation Model

  • This model makes use of a english to german news headline translation dataset derived from Harvard/abc-news-dataset for the task of instruction tuning
  • The dataset was derived using LLaMA3.1 and GPT4o models for generating the translations
  • This model is a fine-tuned version of raghavbali/gpt2-finetuned-headliner.

Model description

This model leverages a Stanford Alpaca style instruction tuning dataset, the format is as follows:

###Translate English Text to German:{text} ###Output: {translated_text}

The format is slightly modified to reduce the additional tokens required for the instructions as GPT2 context size is very limited. The model is trained on small ~5k sample to showcase the impact of instruction tuning on overall alignment of the model towards requested task

Intended uses & limitations

This is only for learning purposes. The model seems to have picked up German vocabulary as well as sentence structures to a good extent but the actual translations are at time grossly incorrect. The model also attempts at completing the news headlines given as prompt and has a high tendency to hallucinate.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 4
  • num_epochs: 1

Training results

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
27
Safetensors
Model size
355M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for raghavbali/gpt2-instruct-tuned-translator2

Finetuned
(2)
this model