dag-based workflow orchestration with hierarchical concurrency control
Hatchet executes complex multi-step workflows defined as directed acyclic graphs (DAGs) stored in the v1_dag table, with built-in hierarchical concurrency management that enforces resource limits at workflow, step, and action levels. The system uses a state machine approach for task lifecycle management (v1_task table) with automatic persistence, enabling workflows to survive service restarts and coordinate dependencies across distributed workers via gRPC streaming.
Unique: Implements hierarchical concurrency control (workflow-level, step-level, action-level semaphores) with fairness scheduling specifically optimized for LLM rate limiting, rather than generic task queue concurrency. Uses PostgreSQL partitioning for v1_task table to scale task state management without sharding application logic.
vs alternatives: More sophisticated than Celery/RQ for concurrency fairness; lighter than Airflow/Prefect by eliminating scheduler overhead through event-driven task assignment via gRPC streaming.
event-driven workflow triggering with cel expression matching
Hatchet triggers workflow runs in response to external events using a CEL (Common Expression Language) expression matcher stored in v1_filter and v1_match tables. When an event is published to the system, the dispatcher evaluates CEL expressions against event payloads to determine which workflows should be triggered, enabling complex conditional logic without hardcoding trigger rules. This architecture decouples event producers from workflow definitions.
Unique: Uses CEL (Common Expression Language) for event matching instead of regex or hardcoded rules, enabling Turing-complete conditional logic while remaining sandboxed and safe. Stores filter definitions in v1_filter table, allowing triggers to be updated without redeploying workers.
vs alternatives: More expressive than webhook path-based routing; simpler than building custom event processors with Kafka Streams or Flink.
horizontal scaling via dispatcher sharding and worker pool management
Hatchet scales horizontally by running multiple dispatcher instances, each managing a subset of worker connections based on worker affinity or hash-based sharding. Workers register with a specific dispatcher instance, and the system routes task assignments to the appropriate dispatcher based on worker availability. The architecture supports adding/removing dispatcher instances without downtime, with workers automatically reconnecting to available dispatchers on failure.
Unique: Implements dispatcher sharding with worker affinity-based routing, allowing horizontal scaling of task assignment throughput without central bottleneck. Workers register with specific dispatcher instances and automatically reconnect on failure.
vs alternatives: More scalable than single-dispatcher architecture; simpler than Kafka-based task distribution but requires careful sharding configuration.
observability and telemetry with structured logging and metrics export
Hatchet includes built-in observability through structured logging (api/v1/server/middleware/telemetry/telemetry.go) and metrics export to OpenTelemetry-compatible backends. The system logs task execution events, worker lifecycle events, and API requests with structured fields (tenant_id, workflow_id, task_id) for easy filtering and correlation. Metrics include task latency, success rates, worker utilization, and dispatcher throughput, exported via OpenTelemetry SDK.
Unique: Implements structured logging with correlation IDs (tenant_id, workflow_id, task_id) and OpenTelemetry metrics export, enabling end-to-end tracing across dispatcher, workers, and API. Logs are JSON-formatted for easy parsing by log aggregation platforms.
vs alternatives: More comprehensive than basic logging; simpler than custom instrumentation but requires external observability platform for full value.
postgresql-based message queue (pgmq) as alternative to rabbitmq
Hatchet supports PostgreSQL PGMQ as a built-in message queue alternative to RabbitMQ, eliminating the need for a separate message broker in simpler deployments. PGMQ uses PostgreSQL tables for queue storage, with the same API as RabbitMQ but without external dependencies. This is suitable for deployments where PostgreSQL is already required and operational complexity should be minimized.
Unique: Provides PostgreSQL PGMQ as a built-in message queue alternative to RabbitMQ, eliminating external broker dependencies for simpler deployments. Uses PostgreSQL tables for queue storage with the same API as RabbitMQ.
vs alternatives: Simpler than RabbitMQ for small deployments; lower throughput but fewer operational dependencies.
workflow versioning and rollback with immutable run history
Hatchet stores workflow definitions with versioning, allowing multiple versions of a workflow to coexist. Each workflow run is bound to a specific workflow version, ensuring that historical runs can be replayed or analyzed against the exact workflow definition that executed them. The system maintains immutable run history in the v1_workflow_run table, preventing accidental modification of historical data.
Unique: Implements workflow versioning with immutable run history, binding each run to a specific workflow version. Enables safe workflow updates without affecting in-flight runs and maintains audit trail of all workflow changes.
vs alternatives: More robust than unversioned workflows; simpler than full workflow state machine versioning in Temporal.
real-time task assignment via grpc streaming with worker heartbeat monitoring
Hatchet's dispatcher service (dispatcher_v1.go) maintains persistent gRPC streaming connections to workers, pushing task assignments in real-time rather than workers polling a queue. The dispatcher monitors worker heartbeats and automatically reassigns tasks from dead workers, implementing a pull-based model where workers declare availability and the dispatcher matches them to queued tasks. This architecture reduces latency and enables fair scheduling across heterogeneous worker pools.
Unique: Uses persistent gRPC streaming for push-based task assignment instead of pull-based polling, with automatic heartbeat-based failure detection and task reassignment. Dispatcher maintains worker registration state and matches tasks to workers based on declared availability, enabling fair scheduling without explicit queue management.
vs alternatives: Lower latency than Redis/RabbitMQ polling-based queues; more sophisticated failure detection than simple timeout-based reassignment.
automatic task retry with exponential backoff and timeout enforcement
Hatchet persists task state in the v1_task table with built-in retry logic that automatically re-executes failed tasks using exponential backoff (configurable base and max multiplier). Each task has a timeout enforced at the dispatcher level; if a task exceeds its timeout, the dispatcher marks it as failed and triggers the retry mechanism. The system tracks retry count and can enforce a maximum retry limit, with all retry history persisted for debugging.
Unique: Implements dispatcher-enforced timeouts combined with automatic exponential backoff retry, with full retry history persisted in v1_task table. Decouples retry logic from worker implementation, ensuring consistent behavior across heterogeneous worker pools.
vs alternatives: More sophisticated than simple retry loops in application code; less flexible than Temporal's activity retry policies but simpler to operate.
+6 more capabilities