David Golchinfar PRO

DavidGF

AI & ML interests

finetune llms, improve german language understanding and generated text of llms

Organizations

DavidGF's activity

posted an update 4 months ago
view post
Post
1301
Introducing Kraken-LoRA – a lightweight version of Kraken that uses LoRA-Adapters as Experts based on the base model.

@fernandofernandes , me, @Crystalcareai , @ehartford created the Kraken-LoRA!

πŸ” What’s the big deal?

βœ… Size Consistency: While Kraken’s size increases with more Experts, Kraken-LoRA remains as compact as the base model (e.g., 8b if you use Meta-Llama3-8b-Instruct).
βœ… VRAM Efficiency: Kraken-LoRA is highly VRAM efficient, maintaining the power of all experts without the bloat.
βœ… Dynamic Adaptation: LoRA adapters are applied dynamically at runtime, following the routing process.
βœ… High Efficiency: Enjoy increased efficiency without compromising performance, as long as the LoRA adapters match the base model.

πŸ’‘ Conclusion: Kraken-LoRA empowers businesses to experience enhanced flexibility and performance from our architecture, enabling further scalability without sacrificing performance.

Check out the model here: VAGOsolutions/Kraken-LoRA
Explore the code here: https://github.com/cognitivecomputations/kraken/tree/main/Kraken-LoRA

Have fun with Kraken-LoRA! πŸ™
posted an update 4 months ago
view post
Post
1463
The kraken has awakened!
A Game-Changer in LLM Flexibility and Performance!

Over the past few weeks, VAGO solutions teamed up with Cognitive Computations and HyperSpace to develop a groundbreaking architecture that redefines flexibility in combining different LLM into one model.

@fernandofernandes , me, @Crystalcareai , @ehartford created the Kraken!

What Can It Do? πŸ™
βœ… Versatile Architecture: Kraken allows the seamless combination of LLMs with varying sizes, quantizations, and model architectures. It currently supports quantizations in 4-bit, 8-bit, and AWQ, with more on the way. And it runs on Hugging Face Transformers 4.40+

βœ… Kraken Router: Utilizing a custom sequence classification model with a context length of 32k tokens, The Kraken Router directs inputs to the most suitable Expert based on their characteristics.

βœ… Adaptability: Enhanced input formatting supports the model’s adaptability to diverse conversational contexts.

βœ… Extreme Versatility: Easily swap experts within Kraken for your specific use cases without retraining the entire model. For example, if you've built a Kraken for coding in Python you can upgrade your Python model without retraining the router or add a C# model by retraining the router.

βœ… Open Source Pipeline: We’re sharing the entire pipeline, including router creation, training, architecture setup, and Kraken inference, on JupyterNotebooks: https://github.com/cognitivecomputations/kraken

Kraken marks the beginning of an exciting new journey in #OpenSource LLM. Why? Because it empowers the open source community in accelerating the catch-up process to proprietary LLMs like #GPT and #Claude 🀩

We proudly introduce the very first 2 Kraken models, that integrates top-tier LLM and Multilingual capabilities:
cognitivecomputations/Kraken
VAGOsolutions/Kraken-Multilingual
Right now it's supported by Hugging Face transformers library. Would love to see the integration into VLM and TGWI!
posted an update 5 months ago
view post
Post
1734
Please... feed this Llama some Sauerkraut! 🍲

Said and done. Here it is. Our Sauerkraut Version of the strong Llama3-8b by Meta. Released from HANNOVER MESSE, just in front of meta booth.
VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct

According to benchmarks (LM-Evaluation-Harness 0.4.2), our #SauerkrautLM Dataset and fine-tuning pipeline improved the Model noticeably (AVG = 74,57), especially Reasoning and Common Sense capabilities.

Again we provide some more detail on the whole process:
βœ… Original model: Llama-3-8b-Instruct
βœ… Training Duration: 12 hours
βœ… Training procedure: 2-staged DPO
βœ… Trained data: 70k (first stage) and 20k (second stage)
βœ… GPU: 4x RTX6000 ADA
βœ… New model: Llama-3-SauerkrautLM-8b-Instruct
βœ… Total training costs: 54,72 Dollar πŸ’΄ - RunPod FTW (excluding synthesizing data, curating data, benchmarks, error handling, testing)

See our model card on Hugging Face for more details: VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct

There will be more details on benchmarks during the next days.
posted an update 5 months ago
view post
Post
3044
"How expensive is it actually to teach a #LanguageModel German through #finetuning πŸ’°πŸ’°πŸ’°? We get asked this quite often.

There is no one-size-fits-all answer to this question, as among other factors:
⏹ each fine-tuning is different,
⏹ the hardware used can be a major cost driver,
⏹ the amount and type of training data can extend the process,
⏹ and the skills to be trained can increase the difficulty of fine-tuning.

However, we have broken down the costs incurred for our latest fine-tune ( VAGOsolutions/SauerkrautLM-Qwen-32b)


Base model: Qwen/Qwen1.5-32B
Fine-Tuning Goal: Train German language
Training dataset size: 160,000 SFT data / 110,000 DPO data
Training duration: 72.5 hours (2 epochs SFT / 1 epoch DPO)
GPU: 2x A100 SXM
New model: VAGOsolutions/SauerkrautLM-Qwen-32b

Total cost: 312 euros πŸ’Ά

These are quite reasonable training costs considering the model now speaks passable German (previously very broken). Depending on the use case and process requirements, this can even be a real alternative to the costly continuous pre-training of foreign language models.