What is Orloj?
Orloj is an open-source orchestration plane for multi-agent AI systems. Define your agents, tools, policies, and workflows as declarative YAML manifests. Orloj handles scheduling, execution, model routing, governance enforcement, and reliability -- so you can run multi-agent systems in production with the same operational rigor you expect from infrastructure.
Why Orloj?
Running AI agents in production today looks a lot like running containers before Kubernetes: ad-hoc scripts, no governance, no observability, and no standard way to manage the lifecycle of an agent fleet.
Orloj solves this by providing an orchestration plane purpose-built for AI agent systems:
- Agents become manageable infrastructure. Declare agents, their models, tools, and constraints in version-controlled manifests. Apply them with a single command.
- Multi-agent workflows are first-class. Define pipelines, hierarchies, and swarm topologies as directed graphs. The runtime handles message routing, fan-out/fan-in, and turn-bounded loops.
- Governance is built in, not bolted on. Policies, roles, and tool permissions are enforced at the execution layer. Unauthorized tool calls fail closed -- not silently.
- Production reliability by default. Lease-based task ownership, capped exponential retry with jitter, idempotency tracking, and dead-letter handling are part of the core runtime.
How It Works
- Start the server -- run
orlojdto host the API, resource store, and task scheduler. - Connect workers -- run one or more
orlojworkerinstances that claim and execute tasks. (Or use--embedded-workerfor single-process development.) - Define your system -- write declarative YAML manifests for agents, tools, policies, and the agent graph.
- Submit a task -- apply a Task resource. The scheduler assigns it to a worker, which executes the agent graph and returns results.
Server (orlojd) -- API server, resource storage (in-memory or Postgres), background services, and task scheduler.
Workers (orlojworker) -- task execution, model gateway routing, tool runtime with isolation, and message bus consumers.
Governance -- AgentPolicy, AgentRole, and ToolPermission resources enforced inline during every tool call and model interaction.
You interact with Orloj through orlojctl (the CLI), the REST API, or the built-in web console.
Key Capabilities
| Capability | Description |
|---|---|
| Agents-as-Code | Declarative YAML manifests for agents, systems, tools, policies, and tasks |
| DAG-based orchestration | Pipeline, hierarchical, and swarm-loop topologies with fan-out/fan-in support |
| Model routing | Per-agent model binding via ModelEndpoint resources (OpenAI, Anthropic, Azure OpenAI, Ollama) |
| Tool isolation | Container, WASM, or sandboxed execution with configurable timeouts and retry |
| Governance and RBAC | AgentPolicy, AgentRole, and ToolPermission with fail-closed enforcement |
| Task scheduling | Cron-based schedules and webhook-triggered task creation from external events |
| Reliability | Lease-based ownership, idempotent replay, capped retry with jitter, dead-letter transitions |
| Observability | Task trace, message lifecycle, per-agent/per-edge metrics, and live event streaming |
| Web console | Built-in UI with topology views, task inspection, and command palette |
Scope
Orloj focuses on production orchestration for multi-agent systems. It assumes you have already chosen your models, prompts, and tools -- Orloj's job is to run them reliably at scale with governance enforced.
Get Started
- Install Orloj -- run from source, build binaries, or use Docker Compose.
- Quickstart -- deploy a multi-agent pipeline in under five minutes.
- Explore Concepts -- understand agents, tasks, tools, governance, and the execution model.
- Follow a Guide -- step-by-step tutorials for common workflows.