# Orloj Docs

> Runtime, governance, and orchestration for agent systems.

## Docs

- [API Reference](/reference/api): **Stability: beta** -- This API surface ships with `orloj.dev/v1` and is suitable for production use, but may evolve with migration guidance in future minor releases.
- [CLI Reference](/reference/cli): This page documents command-line interfaces for operating Orloj.
- [Extension Contracts](/reference/extensions): **Stability: beta** -- Extension interfaces are functional and in use, but may evolve additively in future releases.
- [Glossary](/reference/glossary): Canonical definitions for terms used throughout Orloj documentation.
- [Reference](/reference): Detailed contracts and schemas for API consumers, runtime integrators, and platform engineers.
- [Resource Reference](/reference/resources): **Stability: beta** -- All resource kinds under `orloj.dev/v1` are suitable for production use, but their schemas may evolve with migration guidance in future minor releases.
- [Tool Contract v1](/reference/tool-contract-v1): Status: release-candidate contract targeted for Gate 0 stabilization.
- [WASM Tool Module Contract v1](/reference/wasm-tool-module-contract-v1): Status: release-candidate contract targeted for Gate 0 stabilization.
- [Backup and Restore](/operations/backup-restore): This guide covers backup and restore procedures for Orloj deployments using the Postgres storage backend. Memory-backend deployments are ephemeral and do not require backup.
- [Configuration](/operations/configuration): This page defines runtime configuration for `orlojd`, `orlojworker`, and client-side defaults for `orlojctl` (see also [CLI reference](../reference/cli.md)).
- [Operations](/operations): Use this section to run, secure, troubleshoot, and validate Orloj in production-like environments.
- [Live Validation Matrix](/operations/live-validation-matrix): Use this runbook to exercise Orloj with real model providers and a deterministic local tool stub before open source release.
- [Load Testing](/operations/load-testing): Use `orloj-loadtest` to run repeatable reliability scenarios and enforce non-zero quality gates.
- [Monitoring and Alerts](/operations/monitoring-alerts): Use `orloj-alertcheck` and dashboard contracts to validate runtime reliability signals.
- [Observability](/operations/observability): Orloj provides built-in observability through OpenTelemetry tracing, Prometheus metrics, structured logging, and an in-app trace visualization UI. These features work out of the box in OSS deployments and integrate with standard observability backends.
- [Real Tool Validation (Model Decision Gate)](/operations/real-tool-validation): Use this runbook to validate model-selected tool usage in an Anthropic-backed A/B scenario.
- [Operations Runbook](/operations/runbook): Use this runbook for baseline production operation and incident response.
- [Security and Isolation](/operations/security): This page describes current runtime security controls and expected operator practices.
- [Task Scheduling (Cron)](/operations/task-scheduling): Use `TaskSchedule` to create recurring run tasks from a task template.
- [Tool Runtime Conformance](/operations/tool-runtime-conformance): Status: release-candidate specification for Gate 0 contract stabilization.
- [Troubleshooting](/operations/troubleshooting): Use this page for deterministic diagnosis and remediation of common failures.
- [Upgrades and Rollbacks](/operations/upgrades): This guide defines safe upgrade and rollback procedures for the Orloj server and workers.
- [Webhook Triggers](/operations/webhooks): Use `TaskWebhook` to trigger task runs from signed external HTTP events.
- [Build a Custom Tool](/guides/build-custom-tool): This guide is for developers who need to extend agent capabilities by implementing a custom tool. You will implement the Tool Contract v1, register the tool as a resource, configure isolation and retry, and validate it with the conformance harness.
- [Configure Model Routing](/guides/configure-model-routing): This guide is for platform engineers who need to route agents to different model providers. You will set up ModelEndpoints for multiple providers, bind agents to endpoints by reference, and verify that requests route correctly.
- [Connect an MCP Server](/guides/connect-mcp-server): This guide is for platform engineers who want to connect external MCP (Model Context Protocol) servers to Orloj. You will register an MCP server, verify tool discovery, selectively import tools, and assign them to agents.
- [Deploy Your First Pipeline](/guides/deploy-pipeline): This guide is for platform engineers who want to run a multi-agent pipeline end-to-end. You will define three agents, wire them into a sequential graph, submit a task, and inspect the results.
- [Guides](/guides): Step-by-step tutorials for common Orloj workflows. Each guide walks through a complete use case from start to finish, using real manifests from the `examples/` directory.
- [Set Up Multi-Agent Governance](/guides/setup-governance): This guide is for platform engineers who need to enforce tool authorization and model constraints on their agent systems. You will create policies, roles, and tool permissions, deploy a governed agent system, and verify that unauthorized tool calls are denied.
- [Getting Started](/getting-started): Do these in order: install the binaries or run with Docker, then run the quickstart so a real pipeline executes.
- [Install Orloj](/getting-started/install): This guide covers how to install Orloj for local evaluation and production-like use: from source (clone and run or build), from **release binaries** (GitHub Releases), or from **container images** (GitHub Container Registry). Use release artifacts when you want a tagged, published build instead of building from source.
- [Quickstart](/getting-started/quickstart): Get a multi-agent pipeline running in under five minutes. This quickstart uses sequential execution mode -- the simplest way to run Orloj with a single process and no external dependencies.
- [Deployment Overview](/deployment): This section provides setup runbooks by deployment target.
- [Kubernetes Deployment (Helm + Manifest Fallback)](/deployment/kubernetes): Deploy Orloj on Kubernetes with a Helm chart (recommended) or with raw manifests (fallback).
- [Local Deployment](/deployment/local): Run Orloj locally for development and deterministic feature validation.
- [Remote CLI and API access](/deployment/remote-cli-access): This guide is for **operators and users** who already have `orlojd` reachable on a network (self-hosted, VPS, Kubernetes, or internal URL) and need to call the API from **`orlojctl`**, scripts, or CI. It complements the [quickstart](../getting-started/quickstart.md), which focuses on a single-machine dev loop.
- [VPS Deployment (Compose + systemd)](/deployment/vps): Run Orloj on a single VPS with Docker Compose managed by systemd for automatic restart and reboot recovery.
- [Agents and Agent Systems](/concepts/agents-and-systems): An **Agent** is a declarative unit of work backed by a language model. An **AgentSystem** composes multiple agents into a directed graph that Orloj executes as a coordinated workflow.
- [Governance and Policies](/concepts/governance): Orloj provides a built-in governance layer that controls what agents can do at runtime. Three resource types work together to enforce authorization: **AgentPolicy** constrains execution parameters, **AgentRole** grants named permissions to agents, and **ToolPermission** defines what permissions are required to invoke a tool.
- [Concepts](/concepts): This section explains the core building blocks of Orloj and how they fit together. Each concept page covers what a resource is, why it exists, how to configure it, and how it interacts with the rest of the system.
- [Model Routing](/concepts/model-routing): Orloj decouples agents from specific model providers through **ModelEndpoint** resources. A ModelEndpoint declares a provider, base URL, default model, and authentication -- and agents reference it by name. This lets you swap providers, manage credentials centrally, and route different agents to different models without modifying agent manifests.
- [Tasks and Scheduling](/concepts/tasks-and-scheduling): A **Task** is a request to execute an AgentSystem. Tasks are the unit of work in Orloj -- they carry input, track execution state, and produce output. **TaskSchedules** and **TaskWebhooks** automate task creation from cron expressions and external events.
- [Tools and Isolation](/concepts/tools-and-isolation): A **Tool** is an external capability that agents can invoke during execution. Orloj provides a standardized tool contract, multiple isolation backends, and runtime controls for timeout, retry, and risk classification.
- [Memory](/concepts/memory): Memory gives agents the ability to store, retrieve, and search information across execution steps and across tasks. Orloj implements memory as a layered system: conversation history provides short-term context within a single task turn, a task-scoped shared store lets agents in the same task exchange state, and persistent backends retain knowledge across task runs.
- [Memory Providers](/concepts/memory/providers): Memory providers are the backends that store and retrieve data for Orloj's built-in memory tools. There are two paths to connect a vector database, both coexisting:
- [Execution and Messaging](/architecture/execution-model): This page documents task routing, message lifecycle, and ownership guarantees.
- [Architecture Overview](/architecture/overview): Orloj is organized into three layers: a **server** that manages resources and scheduling, **workers** that execute agent workflows, and a **governance layer** that enforces policies and permissions at runtime.
- [Starter Blueprints](/architecture/starter-blueprints): Blueprints are ready-to-run templates that combine agents, an agent system (the graph), and a task into a single directory. They are the fastest way to see Orloj in action and to understand each orchestration pattern.