abideen
/

phi2-pro

@@ -1,59 +1,61 @@
 ---
 language:
 - en
-license: apache-2.0
-library_name: transformers
 ---
-# **ORPO**
-This is the official repository for <a class="link" href="https://arxiv.org/abs/2403.07691">**Reference-free Monolithic Preference Optimization with Odds Ratio**</a>. The detailed results in the paper can be found in:
-- [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kaist-ai%2Fmistral-orpo-beta)
-- [AlpacaEval](#alpacaeval)
-- [MT-Bench](#mt-bench)
-- [IFEval](#ifeval)
-&nbsp;
-### **`Model Checkpoints`**
-Our models trained with ORPO can be found in:
-- [X] **Mistral-ORPO-⍺**: 🤗 <a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-alpha">kaist-ai/mistral-orpo-alpha</a>
-- [X] **Mistral-ORPO-β**: 🤗 <a class="link" href="https://huggingface.co/kaist-ai/mistral-orpo-beta">kaist-ai/mistral-orpo-beta</a>
-And the corresponding logs for the average log probabilities of chosen/rejected responses during training are reported in:
-- [X] **Mistral-ORPO-⍺**: <a class="link" href="https://wandb.ai/jiwooya1000/PREF/reports/Mistral-ORPO-7B-Training-Log--Vmlldzo3MTE1NzE0?accessToken=rms6o4mg5vo3feu1bvbpk632m4cspe19l0u1p4he3othx5bgean82chn9neiile6">Wandb Report for Mistral-ORPO-⍺</a>
-- [X] **Mistral-ORPO-β**: <a class="link" href="https://wandb.ai/jiwooya1000/PREF/reports/Mistral-ORPO-7B-Training-Log--Vmlldzo3MTE3MzMy?accessToken=dij4qbp6dcrofsanzbgobjsne9el8a2zkly2u5z82rxisd4wiwv1rhp0s2dub11e">Wandb Report for Mistral-ORPO-β</a>
-&nbsp;
-### **`AlpacaEval`**
-<figure>
-  <img class="png" src="/assets/img/alpaca_blog.png" alt="Description of the image">
-  <figcaption><b>Figure 1.</b> AlpacaEval 2.0 score for the models trained with different alignment methods.</figcaption>
-</figure>
-&nbsp;
-### **`MT-Bench`**
-<figure>
-  <img class="png" src="/assets/img/mtbench_hf.png" alt="Description of the image">
-  <figcaption><b>Figure 2.</b> MT-Bench result by category.</figcaption>
-</figure>
-&nbsp;
-### **`IFEval`**
-IFEval scores are measured with <a class="link" href="https://github.com/EleutherAI/lm-evaluation-harness">EleutherAI/lm-evaluation-harness</a> by applying the chat template. The scores for Llama-2-Chat (70B), Zephyr-β (7B), and Mixtral-8X7B-Instruct-v0.1 are originally reported in <a class="link" href="https://twitter.com/wiskojo/status/1739767758462877823">this tweet</a>.
-| **Model Type**     | **Prompt-Strict** | **Prompt-Loose** | **Inst-Strict** | **Inst-Loose** |
-|--------------------|:-----------------:|:----------------:|:---------------:|----------------|
-| **Llama-2-Chat (70B)** |       0.4436      |      0.5342      |      0.5468     |     0.6319     |
-| **Zephyr-β (7B)** |       0.4233      |      0.4547      |      0.5492     |     0.5767     |
-| **Mixtral-8X7B-Instruct-v0.1** |       0.5213      |      **0.5712**      |      0.6343     |     **0.6823**     |
-| **Mistral-ORPO-⍺ (7B)** |       0.5009      |      0.5083      |      0.5995     |     0.6163     |
-| **Mistral-ORPO-β (7B)** |       **0.5287**      |      0.5564      |      **0.6355**     |     0.6619     |

 ---
+library_name: transformers
+license: apache-2.0
+datasets:
+- argilla/dpo-mix-7k
 language:
 - en
 ---
+# Phi2-PRO
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/QEQjVaXVqAjw4eSCAMnkv.jpeg)
+*phi2-pro* is a fine-tuned version of **[microsoft/phi-2](https://huggingface.co/microsoft/phi-2)** on **[argilla/dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k)**
+preference dataset using *Odds Ratio Preference Optimization (ORPO)*. The model has been trained for 1 epoch.
+## LazyORPO
+This model has been trained using **[LazyORPO](https://colab.research.google.com/drive/19ci5XIcJDxDVPY2xC1ftZ5z1kc2ah_rx?usp=sharing)**. A colab notebook that makes the training
+process much easier. Based on [ORPO paper](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2Fpapers%2F2403.07691)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/2h3guPdFocisjFClFr0Kh.png)
+#### What is ORPO?
+Odds Ratio Preference Optimization (ORPO) proposes a new method to train LLMs by combining SFT and Alignment into a new objective (loss function), achieving state of the art results.
+Some highlights of this techniques are:
+* 🧠 Reference model-free → memory friendly
+* 🔄 Replaces SFT+DPO/PPO with 1 single method (ORPO)
+* 🏆 ORPO Outperforms SFT, SFT+DPO on PHI-2, Llama 2, and Mistral
+* 📊 Mistral ORPO achieves 12.20% on AlpacaEval2.0, 66.19% on IFEval, and 7.32 on MT-Bench out Hugging Face Zephyr Beta
+#### Usage
+python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+torch.set_default_device("cuda")
+model = AutoModelForCausalLM.from_pretrained("abideen/phi2-pro", torch_dtype="auto", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("abideen/phi2-pro", trust_remote_code=True)
+inputs = tokenizer('''
+   """
+   Write a detailed analogy between mathematics and a lighthouse.
+   """''', return_tensors="pt", return_attention_mask=False)
+outputs = model.generate(**inputs, max_length=200)
+text = tokenizer.batch_decode(outputs)[0]
+print(text)
+## Evaluation
+### COMING SOON