README.md · chargoddard/servile-harpsichord-cdpo at fee98bc976752ad5883f1908060b76a1f756216d

metadata

license: cc-by-nc-4.0
datasets:
  - pankajmathur/orca_mini_v1_dataset
  - openai/summarize_from_feedback
  - PygmalionAI/PIPPA
  - chargoddard/rpguild
  - lemonilia/LimaRP
  - PKU-Alignment/PKU-SafeRLHF
  - Intel/orca_dpo_pairs
  - allenai/ultrafeedback_binarized_cleaned

Trained on a different random sampling of the same datasets used by loyal-piano-m7, then with cDPO on a blend of RLHF datasets.

Several intermediate checkpoints (of cDPO training) are on branches.

Uses the Alpaca prompt format.