llm-d v0.5.0

Released: February 3, 2026

Full Release Notes: View on GitHub

The llm-d ecosystem consists of multiple interconnected components that work together to provide distributed inference capabilities for large language models.

Components

Component	Description	Repository	Version
Inference Scheduler	The scheduler that makes optimized routing decisions for inference requests to the llm-d inference framework.	llm-d/llm-d-inference-scheduler	v0.5.0
Model Service	`modelservice` is a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet).	llm-d-incubation/llm-d-modelservice	llm-d-modelservice-v0.4.5
Inference Simulator	A light weight vLLM simulator emulates responses to the HTTP REST endpoints of vLLM.	llm-d/llm-d-inference-sim	v0.7.1
Infrastructure	A helm chart for deploying gateway and gateway related infrastructure assets for llm-d.	llm-d-incubation/llm-d-infra	v1.3.6
KV Cache	The libraries for tokenization, KV-events processing, and KV-cache indexing and offloading.	llm-d/llm-d-kv-cache	v0.5.0
Benchmark Tools	This repository provides an automated workflow for benchmarking LLM inference using the llm-d stack. It includes tools for deployment, experiment execution, data collection, and teardown across multiple environments and deployment styles.	llm-d/llm-d-benchmark	v0.3.0
Workload Variant Autoscaler	Graduated from experimental to core component. Provides saturation-based autoscaling for llm-d deployments.	llm-d-incubation/workload-variant-autoscaler	v0.5.0
Gateway API Inference Extension	A Helm chart to deploy an InferencePool, a corresponding EndpointPicker (epp) deployment, and any other related assets.	kubernetes-sigs/gateway-api-inference-extension	v1.3.0

Container Images

Container images are published to the GitHub Container Registry.

ghcr.io/llm-d/<image-name>:<version>

Image	Description	Version	Pull Command
llm-d-cuda	CUDA-based inference image for NVIDIA GPUs	v0.5.0	`ghcr.io/llm-d/llm-d-cuda:v0.5.0`
llm-d-xpu	Intel XPU inference image	v0.5.0	`ghcr.io/llm-d/llm-d-xpu:v0.5.0`
llm-d-cpu	CPU-only inference image (New in v0.5.0)	v0.5.0	`ghcr.io/llm-d/llm-d-cpu:v0.5.0`
llm-d-inference-scheduler	Inference scheduler for optimized routing	v0.5.0	`ghcr.io/llm-d/llm-d-inference-scheduler:v0.5.0`
llm-d-routing-sidecar	Routing sidecar for request redirection	v0.5.0	`ghcr.io/llm-d/llm-d-routing-sidecar:v0.5.0`
llm-d-inference-sim	Lightweight vLLM simulator	v0.7.1	`ghcr.io/llm-d/llm-d-inference-sim:v0.7.1`

Note: The following images have been deprecated in this release: llm-d-aws.

Getting Started

Each component has its own detailed documentation page accessible from the sidebar. For a comprehensive view of how these components work together, see the main Architecture Overview.

Quick Links

Main llm-d Repository - Core platform and orchestration
llm-d-incubation Organization - Experimental and supporting components
Full Release Notes - Release v0.5.0
All Releases - Complete release history

Previous Releases

For information about previous versions and their features, visit the GitHub Releases page.

Contributing

To contribute to any of these components, visit their respective repositories and follow their contribution guidelines. Each component maintains its own development workflow and contribution process.

Components​

Container Images​

Getting Started​

Quick Links​

Previous Releases​

Contributing​