sugatoray (Sugato Ray)

upvoted a paper about 5 hours ago

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28 • 93

upvoted a collection about 5 hours ago

Moshi v0.1 Release

Collection

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 1 day ago • 126

upvoted a collection 1 day ago

Llama3-8B-1.58

Collection

A trio of powerful models: fine-tuned from Llama3-8b-Instruct, with BitNet architecture! • 3 items • Updated 5 days ago • 8

upvoted an article 1 day ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

1 day ago

• 94

upvoted a collection 3 days ago

Core ML Segment Anything 2

Collection

4 items • Updated 6 days ago • 13

upvoted 2 articles 5 days ago

Article

Accelerate 1.0.0

7 days ago

• 31

Article

Introduction to ggml

Aug 13

• 91

upvoted a collection 6 days ago

DataGemma Release

Collection

A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 7 days ago • 53

upvoted a paper 9 days ago

Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

Paper • 2409.06135 • Published 10 days ago • 14

upvoted an article 13 days ago

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28

• 146

upvoted 2 papers 17 days ago

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published 22 days ago • 32

Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2 • 43

upvoted a paper 20 days ago

Scaling Up Diffusion and Flow-based XGBoost Models

Paper • 2408.16046 • Published 22 days ago • 8

upvoted a paper 21 days ago

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published 23 days ago • 137

upvoted a collection 27 days ago

Open-source AI Releases - August '24

Collection

8 items • Updated 28 days ago • 3

upvoted a collection 28 days ago

Cerebras DocChat

Collection

GPT-4 Level Conversational QA Trained In a Few Hours • 5 items • Updated 29 days ago • 3

upvoted 2 collections about 1 month ago

Llama-3.1 Quantization

Collection

Neural Magic quantized Llama-3.1 models • 21 items • Updated 8 days ago • 32

Minitron

Collection

A family of compressed models obtained via pruning and knowledge distillation • 7 items • Updated 2 days ago • 54

upvoted a paper about 1 month ago

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Paper • 2408.06195 • Published Aug 12 • 55

upvoted 2 collections about 1 month ago

🦅 🐍 FalconMamba 7B

Collection

This collection features the FalconMamba 7B base model, the instruction-tuned version, their 4-bit and GGUF variants, and the demo. • 13 items • Updated 1 day ago • 25

LLMs + Mamba

Collection

4 items • Updated Aug 14 • 1

upvoted an article about 1 month ago

Article

Welcome FalconMamba: The first strong attention-free 7B model

Aug 12

• 96

upvoted a collection about 1 month ago

Arctic-embed

Collection

A collection of text embedding models optimized for retrieval accuracy and efficiency • 6 items • Updated Jul 18 • 14

upvoted 2 articles about 1 month ago

Article

A Complete Guide to Audio Datasets

Dec 15, 2022

• 16

Article

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

By

•

Aug 4

• 24

upvoted a collection about 1 month ago

tuning

Collection

50 items • Updated Aug 17 • 4

upvoted a paper about 1 month ago

Improving Retrieval Augmented Language Model with Self-Reasoning

Paper • 2407.19813 • Published Jul 29 • 6

upvoted a collection about 1 month ago

LLMs

Collection

200 items • Updated about 14 hours ago • 14

upvoted a paper about 1 month ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 61

upvoted an article about 1 month ago

Article

XetHub is joining Hugging Face!

Aug 8

• 76

upvoted 2 papers about 2 months ago

Probabilistic Programming with Programmable Variational Inference

Paper • 2406.15742 • Published Jun 22 • 2

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 84

upvoted a collection about 2 months ago

LLM2Vec

Collection

13 items • Updated Jun 28 • 31

upvoted 2 articles about 2 months ago

Article

Memory-efficient Diffusion Transformers with Quanto and Diffusers

Jul 30

• 50

Article

🔥 Argilla 2.0: the data-centric tool for AI makers 🤗

By

•

Jul 30

• 31

upvoted 2 papers about 2 months ago

Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows

Paper • 2406.16218 • Published Jun 23 • 1

AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26 • 31

upvoted 2 collections about 2 months ago

MambaVision

Collection

MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes tiny, tiny2, small, base, large and large2 variants. • 8 items • Updated Jul 24 • 12

NuminaMath

Collection

Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21 • 53

upvoted 2 papers about 2 months ago

TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON

Paper • 2407.15734 • Published Jul 22 • 1

LAMBDA: A Large Model Based Data Agent

Paper • 2407.17535 • Published Jul 24 • 34

upvoted a collection about 2 months ago

Llama 3.1

Collection

This collection hosts the transformers and original repos of the Meta Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Aug 2 • 568

upvoted 2 articles about 2 months ago

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11

• 92

Article

WWDC 24: Running Mistral 7B with Core ML

Jul 22

• 54

upvoted a paper about 2 months ago

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Paper • 2407.10960 • Published Jul 15 • 10

upvoted a collection about 2 months ago

DCLM

Collection

DCLM Models + Datasets • 7 items • Updated Jul 22 • 38

upvoted an article 2 months ago

Article

How we leveraged distilabel to create an Argilla 2.0 Chatbot

Jul 16

• 30

upvoted a paper 2 months ago

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Paper • 2407.11963 • Published Jul 16 • 43

upvoted a collection 2 months ago

🪐 SmolLM

Collection

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 169

upvoted an article 2 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 242

upvoted 2 papers 2 months ago

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Paper • 2407.09025 • Published Jul 12 • 122

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 33

upvoted 2 articles 3 months ago

Article

quanto: a pytorch quantization toolkit

Mar 18

• 28

Article

Welcome Gemma 2 - Google's new open LLM

Jun 27

• 115

upvoted a collection 3 months ago

Gemma 2 Release

Collection

15 items • Updated 10 days ago • 166

upvoted an article 3 months ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Jun 24

• 166

upvoted 4 papers 3 months ago

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

Paper • 2405.20233 • Published May 30 • 5

HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full Context Interaction

Paper • 2401.17948 • Published Jan 31 • 2

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Paper • 2406.14546 • Published Jun 20 • 1

Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models

Paper • 2406.14599 • Published Jun 20 • 16

Sugato Ray

AI & ML interests

Organizations

sugatoray's activity

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Accelerate 1.0.0

Introduction to ggml

Training and Finetuning Embedding Models with Sentence Transformers v3

Welcome FalconMamba: The first strong attention-free 7B model

A Complete Guide to Audio Datasets

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

XetHub is joining Hugging Face!

Memory-efficient Diffusion Transformers with Quanto and Diffusers

🔥 Argilla 2.0: the data-centric tool for AI makers 🤗

How NuminaMath Won the 1st AIMO Progress Prize

WWDC 24: Running Mistral 7B with Core ML

How we leveraged distilabel to create an Argilla 2.0 Chatbot

SmolLM - blazingly fast and remarkably powerful

quanto: a pytorch quantization toolkit

Welcome Gemma 2 - Google's new open LLM

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models