DavidGF (David Golchinfar)

posted an update 4 months ago

Post

1301

Introducing Kraken-LoRA – a lightweight version of Kraken that uses LoRA-Adapters as Experts based on the base model.

@fernandofernandes , me, @Crystalcareai , @ehartford created the Kraken-LoRA!

🔍 What’s the big deal?

✅ Size Consistency: While Kraken’s size increases with more Experts, Kraken-LoRA remains as compact as the base model (e.g., 8b if you use Meta-Llama3-8b-Instruct).
✅ VRAM Efficiency: Kraken-LoRA is highly VRAM efficient, maintaining the power of all experts without the bloat.
✅ Dynamic Adaptation: LoRA adapters are applied dynamically at runtime, following the routing process.
✅ High Efficiency: Enjoy increased efficiency without compromising performance, as long as the LoRA adapters match the base model.

💡 Conclusion: Kraken-LoRA empowers businesses to experience enhanced flexibility and performance from our architecture, enabling further scalability without sacrificing performance.

Check out the model here: VAGOsolutions/Kraken-LoRA
Explore the code here: https://github.com/cognitivecomputations/kraken/tree/main/Kraken-LoRA

Have fun with Kraken-LoRA! 🐙

posted an update 4 months ago

Post

1463

The kraken has awakened!
A Game-Changer in LLM Flexibility and Performance!

Over the past few weeks, VAGO solutions teamed up with Cognitive Computations and HyperSpace to develop a groundbreaking architecture that redefines flexibility in combining different LLM into one model.

@fernandofernandes , me, @Crystalcareai , @ehartford created the Kraken!

What Can It Do? 🐙
✅ Versatile Architecture: Kraken allows the seamless combination of LLMs with varying sizes, quantizations, and model architectures. It currently supports quantizations in 4-bit, 8-bit, and AWQ, with more on the way. And it runs on Hugging Face Transformers 4.40+

✅ Kraken Router: Utilizing a custom sequence classification model with a context length of 32k tokens, The Kraken Router directs inputs to the most suitable Expert based on their characteristics.

✅ Adaptability: Enhanced input formatting supports the model’s adaptability to diverse conversational contexts.

✅ Extreme Versatility: Easily swap experts within Kraken for your specific use cases without retraining the entire model. For example, if you've built a Kraken for coding in Python you can upgrade your Python model without retraining the router or add a C# model by retraining the router.

✅ Open Source Pipeline: We’re sharing the entire pipeline, including router creation, training, architecture setup, and Kraken inference, on JupyterNotebooks: https://github.com/cognitivecomputations/kraken

Kraken marks the beginning of an exciting new journey in #OpenSource LLM. Why? Because it empowers the open source community in accelerating the catch-up process to proprietary LLMs like #GPT and #Claude 🤩

We proudly introduce the very first 2 Kraken models, that integrates top-tier LLM and Multilingual capabilities:
cognitivecomputations/Kraken
VAGOsolutions/Kraken-Multilingual
Right now it's supported by Hugging Face transformers library. Would love to see the integration into VLM and TGWI!

posted an update 5 months ago

Post

1734

Please... feed this Llama some Sauerkraut! 🍲

Said and done. Here it is. Our Sauerkraut Version of the strong Llama3-8b by Meta. Released from HANNOVER MESSE, just in front of meta booth.
VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct

According to benchmarks (LM-Evaluation-Harness 0.4.2), our #SauerkrautLM Dataset and fine-tuning pipeline improved the Model noticeably (AVG = 74,57), especially Reasoning and Common Sense capabilities.

Again we provide some more detail on the whole process:
✅ Original model: Llama-3-8b-Instruct
✅ Training Duration: 12 hours
✅ Training procedure: 2-staged DPO
✅ Trained data: 70k (first stage) and 20k (second stage)
✅ GPU: 4x RTX6000 ADA
✅ New model: Llama-3-SauerkrautLM-8b-Instruct
✅ Total training costs: 54,72 Dollar 💴 - RunPod FTW (excluding synthesizing data, curating data, benchmarks, error handling, testing)

See our model card on Hugging Face for more details: VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct

There will be more details on benchmarks during the next days.

posted an update 5 months ago

Post

3044

"How expensive is it actually to teach a #LanguageModel German through #finetuning 💰💰💰? We get asked this quite often.

There is no one-size-fits-all answer to this question, as among other factors:
⏹ each fine-tuning is different,
⏹ the hardware used can be a major cost driver,
⏹ the amount and type of training data can extend the process,
⏹ and the skills to be trained can increase the difficulty of fine-tuning.

However, we have broken down the costs incurred for our latest fine-tune ( VAGOsolutions/SauerkrautLM-Qwen-32b)

Base model: Qwen/Qwen1.5-32B
Fine-Tuning Goal: Train German language
Training dataset size: 160,000 SFT data / 110,000 DPO data
Training duration: 72.5 hours (2 epochs SFT / 1 epoch DPO)
GPU: 2x A100 SXM
New model: VAGOsolutions/SauerkrautLM-Qwen-32b

Total cost: 312 euros 💶

These are quite reasonable training costs considering the model now speaks passable German (previously very broken). Depending on the use case and process requirements, this can even be a real alternative to the costly continuous pre-training of foreign language models.

David Golchinfar PRO

AI & ML interests

Organizations

DavidGF's activity