Independently Benchmarked Humaneval and Evalplus scores

#13

by VaibhavSahai - opened Jul 23

Jul 23

Below are the independent runs for llama on code performance (HumanEval and the Rigorous EvalPlus).

Congrats Meta team!

VaibhavSahai changed discussion title from Independent Benchmarked Humaneval and Evalplus scores to Independently Benchmarked Humaneval and Evalplus scores Jul 23

ZeroWw

Jul 23

Also the logic is nowhere near the numbers they posted.. at least in my own tests.

VaibhavSahai

Jul 23

Are you referring to the GSM8K and MMLU scores?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment