- Downloads last month
- 2
Inference API (serverless) is not available, repository is disabled.
Evaluation results
- judge_match on squad_answerableself-reported0.624
- judge_match on context_has_answerself-reported0.849
- judge_match on jail_breakself-reported0.076
- judge_match on harmless_promptself-reported0.883
- judge_match on harmful_promptself-reported0.409
- acc on truthfulqaself-reported0.525
- exact_match on gsm8kself-reported0.603
- acc on mmluself-reported0.625