Tanvir1337 (Tanvir)

upvoted an article 6 days ago

Article

🧨 Diffusers welcomes Stable Diffusion 3

Jun 12

• 84

upvoted a collection 6 days ago

Leaderboards and benchmarks ✨

Collection

Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 67 items • Updated Aug 6 • 83

upvoted a collection 7 days ago

DataGemma Release

Collection

A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 7 days ago • 53

upvoted an article 15 days ago

Article

🪆 Introduction to Matryoshka Embedding Models

Feb 23

• 46

upvoted a collection 15 days ago

OLMoE

Collection

Artifacts for open mixture-of-experts language models. • 13 items • Updated 5 days ago • 18

upvoted an article 28 days ago

Article

The 5 Most Under-Rated Tools on Hugging Face

29 days ago

• 74

upvoted a paper about 1 month ago

Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15 • 52

upvoted an article about 1 month ago

Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Jul 31

• 58

upvoted 2 articles about 2 months ago

Article

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Aug 17, 2022

• 56

Article

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

By

•

Aug 25, 2023

• 17

upvoted 2 papers about 2 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 61

The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink

Paper • 2204.05149 • Published Apr 11, 2022 • 4

upvoted a collection about 2 months ago

Gemma Scope Release

Collection

A comprehensive, open suite of sparse autoencoders for Gemma 2 2B and 9B. • 10 items • Updated Aug 11 • 13

upvoted 2 articles about 2 months ago

Article

SegMoE: Segmind Mixture of Diffusion Experts

Feb 3

• 6

Article

Mixture of Experts Explained

Dec 11, 2023

• 157

upvoted 2 collections about 2 months ago

Llama 3.1 Evals

Collection

This collection provides detailed information on how we derived the reported benchmark metrics for the Llama 3.1 models, including the configurations, • 6 items • Updated Aug 2 • 14

Model Merging

Collection

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 211

upvoted 2 articles about 2 months ago

Article

Merge Large Language Models with mergekit

By

•

Jan 9

• 67

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 242

upvoted a collection about 2 months ago

NuminaMath

Collection

Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21 • 54

upvoted an article about 2 months ago

Article

Train a Llama model from scratch

By

•

Jul 29

• 39

upvoted a paper 3 months ago

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28 • 93

upvoted an article 3 months ago

Article

SeeMoE: Implementing a MoE Vision Language Model from Scratch

By

•

Jun 23

• 31

upvoted 4 papers 3 months ago

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17 • 48

upvoted an article 3 months ago

Article

Uncensor any LLM with abliteration

By

•

Jun 13

• 312

upvoted 2 papers 4 months ago

Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

Paper • 2311.08692 • Published Nov 15, 2023 • 12

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19 • 53

upvoted 2 articles 4 months ago

Article

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

By

•

May 7

• 36

Article

Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique

By

•

Nov 30, 2023

• 17

upvoted a paper 4 months ago

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 103

upvoted 4 papers 5 months ago

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Paper • 2404.10667 • Published Apr 16 • 15

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25 • 24

TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14 • 43

Grandmaster-Level Chess Without Search

Paper • 2402.04494 • Published Feb 7 • 65

upvoted a paper 6 months ago

ReALM: Reference Resolution As Language Modeling

Paper • 2403.20329 • Published Mar 29 • 20

upvoted 11 papers 7 months ago

Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 75

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 590

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88

Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures

Paper • 2402.05424 • Published Feb 8 • 17

SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Paper • 2402.13929 • Published Feb 21 • 27

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

Paper • 2402.10128 • Published Feb 15 • 14

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Paper • 2402.10176 • Published Feb 15 • 34

GraphCast: Learning skillful medium-range global weather forecasting

Paper • 2212.12794 • Published Dec 24, 2022 • 1

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Paper • 2402.07033 • Published Feb 10 • 16

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Paper • 2402.04833 • Published Feb 7 • 6

MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices

Paper • 2311.16567 • Published Nov 28, 2023 • 22

upvoted a collection 8 months ago

Qwen1.5 GGUF

Collection

GGUF quants for the new Qwen1.5 model (https://qwenlm.github.io/blog/qwen1.5/) • 5 items • Updated Feb 5 • 10

upvoted 3 papers 8 months ago

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 86

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 140

I am a Strange Dataset: Metalinguistic Tests for Language Models

Paper • 2401.05300 • Published Jan 10 • 4

upvoted a collection 8 months ago

Best for RP on mobile dGPU

Collection

Models without twee romantic language, absurd bad erotica cliches or low coherence. These models are top of their weight class. • 3 items • Updated 13 days ago • 3

upvoted 4 papers 8 months ago

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 157

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

Paper • 2305.14292 • Published May 23, 2023 • 1

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2 • 26

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Paper • 2305.05084 • Published May 8, 2023 • 1

upvoted 2 papers 9 months ago

LLaMA Pro: Progressive LLaMA with Block Expansion

Paper • 2401.02415 • Published Jan 4 • 53

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5 • 40

Tanvir

AI & ML interests

Organizations

Tanvir1337's activity

🧨 Diffusers welcomes Stable Diffusion 3

🪆 Introduction to Matryoshka Embedding Models

The 5 Most Under-Rated Tools on Hugging Face

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

SegMoE: Segmind Mixture of Diffusion Experts

Mixture of Experts Explained

Merge Large Language Models with mergekit

SmolLM - blazingly fast and remarkably powerful

Train a Llama model from scratch

SeeMoE: Implementing a MoE Vision Language Model from Scratch

Uncensor any LLM with abliteration

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique