Kubernetes Deployment (Helm + Manifest Fallback)
Purpose
Deploy Orloj on Kubernetes with a Helm chart (recommended) or with raw manifests (fallback).
Prerequisites
- Kubernetes cluster access (
kubectlcontext configured) - Helm 3 (
helm) curl,jqfor verification (andgoif runningorlojctlfrom source)
The release workflow publishes orlojd and orlojworker container images plus the Helm chart to GHCR — you do not need to build anything yourself unless you're deploying from a local checkout.
Install
1. Install with Helm (Recommended)
The chart is published as an OCI artifact on every v* release:
helm upgrade --install orloj oci://ghcr.io/orlojhq/charts/orloj \
--version 0.14.2 \
--namespace orloj \
--create-namespace \
--set postgresql.auth.password='<strong-password>' \
--set secretEncryptionKey="$(openssl rand -hex 32)" \
--set auth.mode=native \
--set auth.setupToken="$(openssl rand -hex 32)"Notes:
- The chart defaults
image.registry,image.server.repository, andimage.worker.repositoryat the published GHCR images, so you do not need to set them. secretEncryptionKeyis a 256-bit AES key used to encrypt provider API keys at rest in Postgres. Generate withopenssl rand -hex 32and store it as you would any other root secret.auth.mode=nativerequiresauth.setupTokenfor first-user bootstrap. See Operations > Security.- Model provider API keys (Anthropic, OpenAI, Bedrock, etc.) are not chart values — they are encrypted
Secretresources you create viaorlojctlafter the control plane is up, andModelEndpointresources reference them by name. See ModelEndpoint.
To inspect effective values:
helm get values orloj --namespace orlojInstall from a source checkout
If you've cloned the repo and want to deploy a development build, you can install from the chart directory directly. Subchart deps must be resolved first:
helm dependency update charts/orloj
helm upgrade --install orloj ./charts/orloj \
--namespace orloj \
--create-namespace \
--set postgresql.auth.password='<strong-password>' \
--set secretEncryptionKey="$(openssl rand -hex 32)" \
--set auth.mode=native \
--set auth.setupToken="$(openssl rand -hex 32)"To pin custom image tags (for example, a locally-built image pushed to your own registry):
--set image.registry=ghcr.io/<your-org> \
--set image.server.repository=<your-org>/orloj-orlojd \
--set image.server.tag=<your-tag> \
--set image.worker.repository=<your-org>/orloj-orlojworker \
--set image.worker.tag=<your-tag>GitOps (ArgoCD, Flux)
ArgoCD Application example pointing at the OCI chart:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: orloj
namespace: argocd
spec:
project: default
source:
repoURL: ghcr.io/orlojhq/charts
chart: orloj
targetRevision: 0.14.2
helm:
valueFiles:
- values.yaml
destination:
server: https://kubernetes.default.svc
namespace: orloj
syncPolicy:
automated: { prune: true, selfHeal: true }
syncOptions: [ CreateNamespace=true ]2. Manifest Fallback (No Helm)
If you cannot use Helm, apply the baseline manifest set:
- Edit
docs/deploy/kubernetes/orloj-stack.yamlimage references and rotate the baseline secrets (Postgres password, secret encryption key, setup token). - Apply manifests:
kubectl apply -f docs/deploy/kubernetes/orloj-stack.yamlVerify
Wait for rollouts. The Helm release names follow the <release>-<component> convention; with helm install orloj ... you get:
kubectl -n orloj rollout status statefulset/orloj-postgresql
kubectl -n orloj rollout status statefulset/orloj-nats
kubectl -n orloj rollout status deploy/orloj-server
kubectl -n orloj rollout status deploy/orloj-workerIf you used the manifest fallback, the names are unprefixed:
kubectl -n orloj rollout status deploy/postgres
kubectl -n orloj rollout status deploy/nats
kubectl -n orloj rollout status deploy/orlojd
kubectl -n orloj rollout status deploy/orlojworkerPort-forward the API service:
# Helm install
kubectl -n orloj port-forward svc/orloj-server 8080:8080
# Manifest fallback
kubectl -n orloj port-forward svc/orlojd 8080:8080In another terminal:
curl -s http://127.0.0.1:8080/healthz | jq .
orlojctl --server http://127.0.0.1:8080 get workers
orlojctl --server http://127.0.0.1:8080 apply -f examples/blueprints/pipeline/ --run
orlojctl --server http://127.0.0.1:8080 get task bp-pipeline-taskDone means:
- all rollouts are successful.
- API service is reachable through port-forward.
- at least one worker is
Ready. - sample task reaches
Succeeded.
Operate
Scale workers (Helm install):
kubectl -n orloj scale deploy/orloj-worker --replicas=3
kubectl -n orloj rollout status deploy/orloj-workerFor long-term scaling, prefer the HPA values:
worker:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70Restart control plane:
kubectl -n orloj rollout restart deploy/orloj-server
kubectl -n orloj rollout status deploy/orloj-serverView logs:
kubectl -n orloj logs deploy/orloj-server --tail=200
kubectl -n orloj logs deploy/orloj-worker --tail=200Upgrade chart release:
helm upgrade orloj oci://ghcr.io/orlojhq/charts/orloj \
--version <new-version> --namespace orloj --reuse-valuesRollback:
helm rollback orloj <revision> --namespace orlojTroubleshoot
- pods in
ImagePullBackOff: verify image names/tags and registry access. - workers not processing: verify
ORLOJ_AGENT_MESSAGE_CONSUME=trueand message-bus env values. - tasks not created: verify the API endpoint is reachable from
orlojctl.
Tool Isolation: Kubernetes Backend
When toolIsolation.kubernetes.enabled=true, Orloj runs tool invocations with isolation_mode: kubernetes as ephemeral Kubernetes Jobs in the cluster. This eliminates the need for a Docker socket on worker nodes.
RBAC Requirements
The Helm chart automatically creates a Role (and RoleBinding) for the worker ServiceAccount with the following permissions:
| API Group | Resource | Verbs |
|---|---|---|
batch | jobs | create, get, list, watch, delete |
| (core) | pods | get, list |
| (core) | pods/log | get |
| (core) | secrets | get |
The Role is scoped to the namespace configured by toolIsolation.kubernetes.namespace (defaults to the release namespace).
Helm Values
Configure the Kubernetes tool isolation backend under toolIsolation.kubernetes:
toolIsolation:
kubernetes:
enabled: false # Set to true to enable
namespace: "" # Namespace for tool Jobs (default: release namespace)
serviceAccount: "" # Service account for tool Pods (default: worker SA)
jobTTLSeconds: 300 # TTL seconds after Job finishes (automatic cleanup)
defaultImage: "curlimages/curl:8.8.0" # Fallback image for HTTP toolsWhen enabled, the chart sets ORLOJ_TOOL_K8S_ENABLED=true plus related env vars on both the orlojd server and orlojworker deployments.
Coexistence with Container Backend
Both container and kubernetes isolation backends can be active simultaneously. Each tool's spec.runtime.isolation_mode selects which backend handles that tool:
isolation_mode: container— runs viadocker runon the worker hostisolation_mode: kubernetes— runs as a Kubernetes Job in the cluster
This allows gradual migration from Docker-based isolation to Kubernetes-native execution.
Agent Execution: Kubernetes Backend
When agentExecution.kubernetes.enabled=true, Orloj runs each agent in a multi-agent task as an ephemeral Kubernetes Job instead of executing it in-process on the worker. This isolates agent execution at the pod level and allows independent scaling of agent workloads.
Agents whose tools require Docker (container isolation mode or stdio MCP servers with a container image) automatically fall back to in-process execution.
RBAC Requirements
The Helm chart creates a Role (and RoleBindings for both the worker and server ServiceAccounts) with the following permissions:
| API Group | Resource | Verbs |
|---|---|---|
batch | jobs | create, get, list, watch, delete |
| (core) | pods | get, list |
| (core) | pods/log | get |
The Role is scoped to the namespace configured by agentExecution.kubernetes.namespace (defaults to the release namespace).
Helm Values
agentExecution:
kubernetes:
enabled: false # Set to true to enable
namespace: "" # Namespace for agent Jobs (default: release namespace)
serviceAccount: "" # Service account for agent Pods
image: "" # Container image (default: worker image)
jobTTLSeconds: 600 # TTL seconds after Job finishes
defaultMemory: "512Mi" # Default memory limit for agent Pods
defaultCPU: "500m" # Default CPU limit for agent PodsHow It Works
- The orchestrator (worker or server) checks whether the agent can run as a K8s Job (
CanRunAsJob). - If eligible, it writes agent input to the task's status in Postgres and creates a K8s Job running the worker image with
--single-agentmode. - The agent pod reads its input from Postgres, executes the agent, and writes the result back.
- The orchestrator watches the Job for completion and reads the result.
- If the orchestrator crashes and restarts, it detects the existing Job by its deterministic name and resumes watching.
Crash Recovery
Agent Jobs use deterministic names based on the task, agent, and attempt number. If the orchestrator pod restarts mid-execution, it detects the existing Job and either reads its result (if complete) or resumes watching (if still running).
A2A Protocol
To configure public A2A Agent Card URLs in a Helm deployment, set the A2A public base URL. Individual AgentSystems are exposed with spec.a2a.enabled: true.
helm upgrade orloj ./charts/orloj --namespace orloj --reuse-values \
--set a2a.publicBaseURL=https://orloj.example.comSee the Chart README for the full list of a2a.* values and their defaults.
CRD Sync Operator (Optional)
The Orloj CRD operator makes Orloj resources (Agents, Tools, AgentSystems, etc.) real Kubernetes Custom Resource Definitions. When enabled, you can manage configuration with kubectl apply and integrate with GitOps tools like Argo CD and Flux.
helm upgrade --install orloj oci://ghcr.io/orlojhq/charts/orloj \
--namespace orloj --reuse-values \
--set operator.enabled=true \
--set operator.installCRDs=trueThe operator is independent of tool isolation and agent execution backends — it manages the configuration plane (resource definitions), not the execution plane (how tools and agents run). You can use any combination.
See Kubernetes CRD Operator for full documentation, values reference, GitOps examples, and migration guide.
Security Defaults
- This baseline is not HA —
server.replicaCountdefaults to 1. Multi-replicaorlojdrequires leader election (see roadmap). - Rotate secrets before non-test use:
postgresql.auth.password(orpostgresql.auth.existingSecretfor a pre-sealed value).secretEncryptionKey— losing this makes every encrypted OrlojSecretunrecoverable.auth.setupToken— single-use bootstrap; rotate after the first admin account is created.auth.apiToken— set this only if you also need a static bearer for CLI/automation; otherwise rely on user-issued tokens minted through the native auth flow.
ORLOJ_AUTH_MODEdefaults tonative(the chart'sauth.modevalue).auth.mode=offdisables authentication entirely and is intended only for local development.- Restrict namespace and service exposure based on cluster policy. The chart's
server.ingressis opt-in and emits anetworking.k8s.io/v1 Ingress; for Gateway API environments, leaveserver.ingress.enabled=falseand ship anHTTPRoutealongside the release.