stereoplegic ()))?!?((() – Community Activity

commented 5 papers 5 days ago

Post-Training Sparse Attention with Double Sparsity

Paper • 2408.07092 • Published Aug 11 •

2

Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads

Paper • 2407.17678 • Published Jul 25 •

2

MemLong: Memory-Augmented Retrieval for Long Text Modeling

Paper • 2408.16967 • Published 21 days ago • 1 •

2

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Paper • 2409.06679 • Published 9 days ago •

2

UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs

Paper • 2406.18173 • Published Jun 26 •

2

commented 4 papers 10 days ago

commented a paper 21 days ago

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

Paper • 2407.16833 • Published Jul 23 •

2

commented a paper 29 days ago

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

Paper • 2408.11049 • Published about 1 month ago • 10 •

3

commented 8 papers about 1 month ago

σ-GPTs: A New Approach to Autoregressive Models

Paper • 2404.09562 • Published Apr 15 • 4 •

2

Parallelizing Autoregressive Generation with Variational State Space Models

Paper • 2407.08415 • Published Jul 11 •

2

Transformer Language Models without Positional Encodings Still Learn Positional Information

Paper • 2203.16634 • Published Mar 30, 2022 • 5 •

2

Multi-Word Tokenization for Sequence Compression

Paper • 2402.09949 • Published Feb 15 •

2

Position Prediction as an Effective Pretraining Strategy

Paper • 2207.07611 • Published Jul 15, 2022 • 1 •

2

HyperAttention: Long-context Attention in Near-Linear Time

Paper • 2310.05869 • Published Oct 9, 2023 • 2 •

2

IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs

Paper • 2405.02842 • Published May 5 • 1 •

2

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Paper • 2408.04093 • Published Aug 7 • 4 •

2

commented 21 papers about 2 months ago

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads

Paper • 2407.15891 • Published Jul 22 •

2

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Paper • 2406.15486 • Published Jun 17 •

2

LongHeads: Multi-Head Attention is Secretly a Long Context Processor

Paper • 2402.10685 • Published Feb 16 • 1 •

2

Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope

Paper • 2407.15176 • Published Jul 21 •

2

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

Paper • 2402.04617 • Published Feb 7 • 4 •

2

Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model

Paper • 2405.14174 • Published May 23 •

2

Longhorn: State Space Models are Amortized Online Learners

Paper • 2407.14207 • Published Jul 19 • 16 •

3

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29 • 33 •

4

RecycleGPT: An Autoregressive Language Model with Recyclable Module

Paper • 2308.03421 • Published Aug 7, 2023 • 7 •

2

SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding

Paper • 2406.18200 • Published Jun 26 •

2

$\text{Memory}^3$: Language Modeling with Explicit Memory

Paper • 2407.01178 • Published Jul 1 • 3 •

2

Efficient Transformers with Dynamic Token Pooling

Paper • 2211.09761 • Published Nov 17, 2022 •

2

Crafting the Path: Robust Query Rewriting for Information Retrieval

Paper • 2407.12529 • Published Jul 17 •

2

Conversational Query Reformulation with the Guidance of Retrieved Documents

Paper • 2407.12363 • Published Jul 17 •

2

CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search

Paper • 2406.05013 • Published Jun 7 •

2

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers

Paper • 2406.10991 • Published Jun 16 •

2

Factual Dialogue Summarization via Learning from Large Language Models

Paper • 2406.14709 • Published Jun 20 •

2

AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment

Paper • 2407.01965 • Published Jul 2 •

2

Automatically Generating Numerous Context-Driven SFT Data for LLMs across Diverse Granularity

Paper • 2405.16579 • Published May 26 •

2

Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

Paper • 2407.03040 • Published Jul 3 •

2

Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation

Paper • 2406.03703 • Published Jun 6 • 1 •

2

commented 20 papers 2 months ago

Stateful Memory-Augmented Transformers for Dialogue Modeling

Paper • 2209.07634 • Published Sep 15, 2022 • 1 •

2

Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness

Paper • 2406.04156 • Published Jun 6 •

2

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

Paper • 2106.12672 • Published Jun 23, 2021 •

2

OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining

Paper • 2311.08849 • Published Nov 15, 2023 • 5 •

4

Exploring Design Choices for Building Language-Specific LLMs

Paper • 2406.14670 • Published Jun 20 • 1 •

2

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

Paper • 2407.08818 • Published Jul 11 •

2

Word-Level Representation From Bytes For Language Modeling

Paper • 2211.12677 • Published Nov 23, 2022 •

2

Adaptive Draft-Verification for Efficient Large Language Model Decoding

Paper • 2407.12021 • Published Jun 27 •

2

Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training

Paper • 2406.17404 • Published Jun 25 • 1 •

2

Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

Paper • 2406.16758 • Published Jun 24 • 19 •

3

S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models

Paper • 2407.01955 • Published Jul 2 •

2

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

Paper • 2406.14066 • Published Jun 20 • 1 •

2

Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

Paper • 2406.03853 • Published Jun 6 •

2

OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

Paper • 2406.17276 • Published Jun 25 •

2

Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

Paper • 2405.18628 • Published May 28 •

2

Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style

Paper • 2406.13170 • Published Jun 19 •

2

GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression

Paper • 2407.12077 • Published Jul 16 • 52 •

8

Neurocache: Efficient Vector Retrieval for Long-range Language Modeling

Paper • 2407.02486 • Published Jul 2 •

2

Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Paper • 2406.10985 • Published Jun 16 •

2

Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs

Paper • 2402.14903 • Published Feb 22 •

1

)))?!?(((

AI & ML interests

Organizations

stereoplegic's activity