Capability
Request Scheduling With Prefill Decode Disaggregation
2 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “request scheduling with prefill-decode disaggregation”
Fast LLM/VLM serving — RadixAttention, prefix caching, structured output, automatic parallelism.
Unique: Separates prefill and decode scheduling with different batch sizes and priorities, enabling continuous batching where new requests are added to the decode queue without blocking prefill operations.
vs others: Achieves lower time-to-first-token than vLLM through prefill-decode disaggregation and continuous batching, with higher decode throughput by using larger decode batch sizes.