File size: 547 Bytes
bea3712
 
 
 
 
 
 
 
 
 
fee98bc
bea3712
 
 
 
13cdf6b
 
bea3712
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
license: cc-by-nc-4.0
datasets:
- pankajmathur/orca_mini_v1_dataset
- openai/summarize_from_feedback
- PygmalionAI/PIPPA
- chargoddard/rpguild
- lemonilia/LimaRP
- PKU-Alignment/PKU-SafeRLHF
- Intel/orca_dpo_pairs
- allenai/ultrafeedback_binarized_cleaned
---

Trained on a different random sampling of the same datasets used by [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7), then with cDPO on a blend of RLHF datasets.

Several intermediate checkpoints (of cDPO training) are on branches.

Uses the Alpaca prompt format.