--- title: README emoji: 🐢 colorFrom: purple colorTo: purple sdk: static pinned: false --- Text-Generation-Inference is, an open-source, purpose-built solution for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation using Tensor Parallelism and dynamic batching for the most popular open-source LLMs, including StarCoder, BLOOM, GPT-NeoX, Llama, and T5. Text Generation Inference is already used by customers such as IBM, Grammarly, and the Open-Assistant initiative implements optimization for all supported model architectures, including: - Tensor Parallelism and custom cuda kernels - Optimized transformers code for inference using flash-attention and Paged Attention on the most popular architectures - Quantization with bitsandbytes or gptq - Continuous batching of incoming requests for increased total throughput - Accelerated weight loading (start-up time) with safetensors - Logits warpers (temperature scaling, topk, repetition penalty ...) - Watermarking with A Watermark for Large Language Models - Stop sequences, Log probabilities - Token streaming using Server-Sent Events (SSE) ## Currently optimized architectures - [BLOOM](https://huggingface.co/bigscience/bloom) - [FLAN-T5](https://huggingface.co/google/flan-t5-xxl) - [Galactica](https://huggingface.co/facebook/galactica-120b) - [GPT-Neox](https://huggingface.co/EleutherAI/gpt-neox-20b) - [Llama](https://github.com/facebookresearch/llama) - [OPT](https://huggingface.co/facebook/opt-66b) - [SantaCoder](https://huggingface.co/bigcode/santacoder) - [Starcoder](https://huggingface.co/bigcode/starcoder) - [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b) - [Falcon 40B](https://huggingface.co/tiiuae/falcon-40b) ## Check out the source code 👉 - the server backend: https://github.com/huggingface/text-generation-inference - the Chat UI: https://huggingface.co/spaces/text-generation-inference/chat-ui ## Check out examples - [Introducing the Hugging Face LLM Inference Container for Amazon SageMaker](https://huggingface.co/blog/sagemaker-huggingface-llm) - [Deploy LLMs with Hugging Face Inference Endpoints](https://huggingface.co/blog/inference-endpoints-llm)