Sinisa Stanivuk

Stopwolf

AI & ML interests

Multilingual LLMs, STT and TTS models

Organizations

Posts 1

view post
Post
1019
๐Ÿ‡ท๐Ÿ‡ธ New Benchmark for Serbian Language ๐Ÿ‡ท๐Ÿ‡ธ

@DjMel and I recently released a new benchmark for Serbian language that measures General Knowledge of LLMs. We had to parse over 20 years of university entrance exams for University of Belgrade, so the dataset is of high quality.

๐Ÿฅ‡ OAI models still hold the podium places with a significant gap compared to open-source models
๐Ÿค” Qwen/Qwen2-7B-Instruct and VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct models show promising results considering they weren't trained on Serbian language
๐Ÿ“ˆ Best open-source model seems to be Stopwolf/Mustra-7B-Instruct-v0.2, a merge between gordicaleksa/YugoGPT and mistralai/Mistral-7B-Instruct-v0.2
๐Ÿ“‰ Some models like google/gemma-2-9b-it turned out to be a disappointment with random guessing-like accuracy

Take a look at the whole results at the dataset page:
DjMel/oz-eval

P.S. If you have any constructive criticism or ideas for improvement, feel free to use dataset's Discussions page!