Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 93
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 1 day ago • 126
Llama3-8B-1.58 Collection A trio of powerful models: fine-tuned from Llama3-8b-Instruct, with BitNet architecture! • 3 items • Updated 5 days ago • 8
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 7 days ago • 53
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published 10 days ago • 14
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28 • 146
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding Paper • 2408.15545 • Published 22 days ago • 32
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published 23 days ago • 137
Cerebras DocChat Collection GPT-4 Level Conversational QA Trained In a Few Hours • 5 items • Updated 29 days ago • 3
Llama-3.1 Quantization Collection Neural Magic quantized Llama-3.1 models • 21 items • Updated 8 days ago • 32
Minitron Collection A family of compressed models obtained via pruning and knowledge distillation • 7 items • Updated 2 days ago • 54
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12 • 55
🦅 🐍 FalconMamba 7B Collection This collection features the FalconMamba 7B base model, the instruction-tuned version, their 4-bit and GGUF variants, and the demo. • 13 items • Updated 1 day ago • 25
Arctic-embed Collection A collection of text embedding models optimized for retrieval accuracy and efficiency • 6 items • Updated Jul 18 • 14
view article Article The case for specialized pre-training: ultra-fast foundation models for dedicated tasks By Pclanglais • Aug 4 • 24
Improving Retrieval Augmented Language Model with Self-Reasoning Paper • 2407.19813 • Published Jul 29 • 6
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 61
Probabilistic Programming with Programmable Variational Inference Paper • 2406.15742 • Published Jun 22 • 2
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 84
view article Article 🔥 Argilla 2.0: the data-centric tool for AI makers 🤗 By dvilasuero • Jul 30 • 31
Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows Paper • 2406.16218 • Published Jun 23 • 1
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 31
MambaVision Collection MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes tiny, tiny2, small, base, large and large2 variants. • 8 items • Updated Jul 24 • 12
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21 • 53
TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON Paper • 2407.15734 • Published Jul 22 • 1
Llama 3.1 Collection This collection hosts the transformers and original repos of the Meta Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Aug 2 • 568
Fast Matrix Multiplications for Lookup Table-Quantized LLMs Paper • 2407.10960 • Published Jul 15 • 10
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 169
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12 • 122
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper • 2309.03883 • Published Sep 7, 2023 • 33
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 • 166
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper • 2405.20233 • Published May 30 • 5
HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full Context Interaction Paper • 2401.17948 • Published Jan 31 • 2
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data Paper • 2406.14546 • Published Jun 20 • 1
Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models Paper • 2406.14599 • Published Jun 20 • 16