--- license: cc-by-nc-4.0 datasets: - pankajmathur/orca_mini_v1_dataset - openai/summarize_from_feedback - PygmalionAI/PIPPA - chargoddard/rpguild - lemonilia/LimaRP - PKU-Alignment/PKU-SafeRLHF - Intel/orca_dpo_pairs - allenai/ultrafeedback_binarized_cleaned --- Trained on a different random sampling of the same datasets used by [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7), then with cDPO on a blend of RLHF datasets. Several intermediate checkpoints (of cDPO training) are on branches. Uses the Alpaca prompt format.