Capability

Request Scheduling With Prefill Decode Disaggregation

2 artifacts provide this capability.

Want a personalized recommendation?

Top Matches

via “request scheduling with prefill-decode disaggregation”

Fast LLM/VLM serving — RadixAttention, prefix caching, structured output, automatic parallelism.

Unique: Separates prefill and decode scheduling with different batch sizes and priorities, enabling continuous batching where new requests are added to the decode queue without blocking prefill operations.

vs others: Achieves lower time-to-first-token than vLLM through prefill-decode disaggregation and continuous batching, with higher decode throughput by using larger decode batch sizes.

Request Scheduling With Prefill Decode Disaggregation

Top Matches

Also Known As

Company