Capability
Adaptive Dynamic Batching With Configurable Queue And Timeout Policies
5 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
ML model serving framework — package models as Bentos, adaptive batching, GPU, distributed serving.
Unique: Implements task queue-based batching at the serving layer with per-endpoint configuration, allowing fine-grained control over batch size, timeout, and queue strategy without modifying model code — integrated directly into the request processing pipeline.
vs others: More efficient than application-level batching (e.g., in FastAPI middleware) because it operates at the worker process level with direct access to model execution, reducing context switching and enabling better GPU memory management.