Deployment
Ductwork is designed to run reliably in production environments, from traditional servers to containerized deployments. By default, bin/ductwork starts a supervisor that forks child processes: one pipeline advancer and one job worker per configured pipeline. This works well for traditional deployments but conflicts with container best practices.
The container problem: Containers expect a single foreground process. Forking introduces complications with signal handling, logging, and process management; container orchestrators like Kubernetes work best when they can directly manage each process.
The role Configuration
Section titled “The role Configuration”Ductwork provides a role configuration that allows you to run a single process type without forking. This aligns with the container philosophy of one process per container.
Set the role in your YAML configuration:
default: &default role: advancer pipelines: "*"
production: <<: *defaultdefault: &default role: worker pipelines: - EnrichUserDataPipeline - ProcessOrdersPipeline
production: <<: *defaultOr override via environment variable:
DUCTWORK_ROLE=advancer bin/ductworkDocker and Containerized Deployments
Section titled “Docker and Containerized Deployments”Container-per-Role Strategy
Section titled “Container-per-Role Strategy”Run separate containers for each role. This gives you independent scaling, cleaner logs, and better resource isolation.
Dockerfile:
FROM ruby:3.3-slim
WORKDIR /app
COPY Gemfile Gemfile.lock ./RUN bundle install --without development test
COPY . .
CMD ["bin/ductwork"]docker-compose.yml:
version: "3.8"
services: pipeline_advancer: build: . command: bin/ductwork -c config/ductwork.yml environment: DUCTWORK_ROLE: advancer RAILS_ENV: production DATABASE_URL: postgres://db:5432/myapp depends_on: - db restart: unless-stopped
job_worker_enrichment: build: . command: bin/ductwork -c config/ductwork.enrichment.yml environment: DUCTWORK_ROLE: worker RAILS_ENV: production DATABASE_URL: postgres://db:5432/myapp depends_on: - db - pipeline_advancer deploy: replicas: 3 restart: unless-stopped
job_worker_orders: build: . command: bin/ductwork -c config/ductwork.orders.yml environment: DUCTWORK_ROLE: worker RAILS_ENV: production DATABASE_URL: postgres://db:5432/myapp depends_on: - db - pipeline_advancer deploy: replicas: 2 restart: unless-stopped
db: image: postgres:16 volumes: - postgres_data:/var/lib/postgresql/data
volumes: postgres_data:Why Separate Containers?
Section titled “Why Separate Containers?”Scaling: Scale job workers independently of the advancer. If your bottleneck is step execution, add more worker containers without spinning up redundant advancers.
Resource allocation: Assign different CPU and memory limits based on workload characteristics. Advancers are typically lightweight; workers may need more resources depending on your step logic.
Logging: Each container’s stdout is a single, coherent log stream. No interleaved output from multiple processes.
Failure isolation: A crash in a job worker doesn’t affect the advancer or other workers. The orchestrator restarts only what failed.
Kubernetes Deployment
Section titled “Kubernetes Deployment”Kubernetes deployments follow the same container-per-role pattern.
Pipeline Advancer Deployment
Section titled “Pipeline Advancer Deployment”apiVersion: apps/v1kind: Deploymentmetadata: name: ductwork-advancer labels: app: ductwork component: advancerspec: replicas: 1 # Usually only need one selector: matchLabels: app: ductwork component: advancer template: metadata: labels: app: ductwork component: advancer spec: containers: - name: advancer image: myapp:latest command: ["bin/ductwork"] args: ["-c", "config/ductwork.yml"] env: - name: DUCTWORK_ROLE value: advancer - name: RAILS_ENV value: production - name: DATABASE_URL valueFrom: secretKeyRef: name: app-secrets key: database-url resources: requests: memory: "256Mi" cpu: "100m" limits: memory: "512Mi" cpu: "500m" livenessProbe: exec: command: - pgrep - -f - ductwork initialDelaySeconds: 10 periodSeconds: 30 terminationGracePeriodSeconds: 45Job Worker Deployment
Section titled “Job Worker Deployment”apiVersion: apps/v1kind: Deploymentmetadata: name: ductwork-worker labels: app: ductwork component: workerspec: replicas: 5 selector: matchLabels: app: ductwork component: worker template: metadata: labels: app: ductwork component: worker spec: containers: - name: worker image: myapp:latest command: ["bin/ductwork"] args: ["-c", "config/ductwork.yml"] env: - name: DUCTWORK_ROLE value: worker - name: RAILS_ENV value: production - name: DATABASE_URL valueFrom: secretKeyRef: name: app-secrets key: database-url resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "1Gi" cpu: "1000m" livenessProbe: exec: command: - pgrep - -f - ductwork initialDelaySeconds: 10 periodSeconds: 30 terminationGracePeriodSeconds: 45Horizontal Pod Autoscaling
Section titled “Horizontal Pod Autoscaling”Scale workers based on CPU utilization or custom metrics:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: ductwork-worker-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ductwork-worker minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70Health Checks
Section titled “Health Checks”For orchestrators that support health checks, you can verify the Ductwork process is running:
Simple process check:
pgrep -f ductworkHeartbeat-based check: Query the ductwork_processes table for recent heartbeats from the current container. A process that hasn’t reported a heartbeat in 5 minutes is considered unhealthy.
# Example health check endpointDuctwork::Process .where('last_heartbeat_at > ?', 5.minutes.ago) .exists?Traditional (Non-Containerized) Deployment
Section titled “Traditional (Non-Containerized) Deployment”For VMs or bare-metal servers, use the default supervisor mode with a process manager like systemd:
[Unit]Description=Ductwork Pipeline ProcessorAfter=network.target postgresql.service
[Service]Type=simpleUser=deployWorkingDirectory=/var/www/myapp/currentExecStart=/var/www/myapp/current/bin/ductwork -c config/ductwork.ymlRestart=alwaysRestartSec=5Environment=RAILS_ENV=production
[Install]WantedBy=multi-user.targetIn this mode, the supervisor handles process management internally, restarting crashed children automatically.
Summary
Section titled “Summary”| Deployment Type | Role Config | Process Management |
|---|---|---|
| Traditional (VM/bare-metal) | supervisor (default) | systemd, upstart, etc. |
| Docker/Kubernetes | advancer or worker | Orchestrator handles restarts |
The role configuration lets you choose the model that fits your infrastructure. For containers, run separate services for each role. For traditional deployments, let the supervisor manage child processes.