Independently Benchmarked Humaneval and Evalplus scores

#13
by VaibhavSahai - opened

Below are the independent runs for llama on code performance (HumanEval and the Rigorous EvalPlus).
Screenshot 2024-07-23 at 11.38.21 AM.png
Congrats Meta team!

VaibhavSahai changed discussion title from Independent Benchmarked Humaneval and Evalplus scores to Independently Benchmarked Humaneval and Evalplus scores

Also the logic is nowhere near the numbers they posted.. at least in my own tests.

Are you referring to the GSM8K and MMLU scores?

Sign up or log in to comment