Instruction Tuning Datasets For both SFT and DPO tatsu-lab/alpaca Viewer • Updated May 22, 2023 • 52k • 88.1k • 678 elichen3051/alpaca52k-alignment-handbook Viewer • Updated Jun 7 • 52k • 2 • 1 yahma/alpaca-cleaned Viewer • Updated Apr 10, 2023 • 51.8k • 41.1k • 547 HuggingFaceH4/ultrachat_200k Viewer • Updated Feb 22 • 515k • 30.3k • 442
Pre-Training Datasets allenai/c4 Viewer • Updated Jan 9 • 10.4B • 245k • 277 allenai/dolma Updated Apr 17 • 786 • 790 togethercomputer/RedPajama-Data-1T Viewer • Updated Jun 17 • 1.73M • 131k • 1.05k tiiuae/falcon-refinedweb Viewer • Updated Jun 20, 2023 • 968M • 1.89k • 797