Supervisor
The supervisor is the orchestrator of your Ductwork infrastructure. As the parent process for each bin/ductwork instance, it launches and monitors all child processes, ensuring your pipelines continue running even when individual processes fail.
Overview
Section titled “Overview”When you run bin/ductwork, you start a supervisor process that:
- Launches one pipeline advancer process
- Launches one job worker process for each configured pipeline
- Monitors all child processes through heartbeat checks
- Automatically restarts failed or hung processes
- Coordinates graceful shutdown across all processes
The supervisor acts as the resilience layer, making Ductwork pipelines fault-tolerant without manual intervention.
Process Hierarchy
Section titled “Process Hierarchy”Each Ductwork instance creates the following process tree:
supervisor (bin/ductwork)├── pipeline advancer│ └── thread for Pipeline A│ └── thread for Pipeline B│ └── thread for Pipeline C├── job worker (Pipeline A)│ └── worker thread 1│ └── worker thread 2│ └── ...├── job worker (Pipeline B)│ └── worker thread 1│ └── worker thread 2│ └── ...└── job worker (Pipeline C) └── worker thread 1 └── worker thread 2 └── ...Responsibilities
Section titled “Responsibilities”Process Lifecycle Management
Section titled “Process Lifecycle Management”The supervisor manages the complete lifecycle of child processes:
Startup:
- Read configuration from YAML file
- Fork the pipeline advancer process
- Fork one job worker process per configured pipeline
- Register signal handlers for graceful shutdown
- Enter monitoring loop
Monitoring:
- Check heartbeats from each child process
- Track process health and uptime
- Detect crashes or hangs
- Log process status changes
Recovery:
- Automatically restart failed processes
- Maintain pipeline availability during failures
- Preserve pipeline state through process restarts
Shutdown:
- Forward shutdown signals to all children
- Wait for graceful shutdown with timeout
- Terminate unresponsive processes
- Clean up resources and exit
Heartbeat Monitoring
Section titled “Heartbeat Monitoring”The supervisor continuously monitors child process health through periodic heartbeats. Each child process reports its status at regular intervals, confirming it’s alive and processing work.
Detection: If a child process fails to report a heartbeat within 5 minutes—indicating a crash, hang, or deadlock—the supervisor detects the failure.
Recovery: The supervisor immediately spawns a replacement process to restore full pipeline capacity. The new process picks up where the previous one left off, resuming work on pending jobs.
Why 5 minutes? This timeout balances quick failure detection with tolerance for legitimately slow operations. Steps should typically complete in seconds, but this buffer accounts for temporarily degraded performance without false positives.
Configuration
Section titled “Configuration”The supervisor’s behavior is controlled through config/ductwork.yml:
pipelines
Section titled “pipelines”Specifies which pipelines to run. The supervisor creates child processes based on this configuration.
default: &default pipelines: - EnrichUserDataPipeline - ProcessOrdersPipelineOr use the wildcard to run all defined pipelines:
default: &default pipelines: "*"Note: The supervisor creates one advancer and one job worker per pipeline listed here.
supervisor.polling_timeout
Section titled “supervisor.polling_timeout”How long (in seconds) the supervisor sleeps between heartbeat checks.
Default: 1 second
default: &default supervisor: polling_timeout: 5Tuning: Shorter intervals provide faster failure detection but increase CPU usage. Longer intervals reduce overhead but delay failure detection. The default (1 second) works well for most applications.
supervisor.shutdown_timeout
Section titled “supervisor.shutdown_timeout”Maximum time (in seconds) to wait for child processes to shut down gracefully. After this timeout, remaining processes receive SIGKILL and terminate immediately.
Default: 30 seconds
default: &default supervisor: shutdown_timeout: 45Important: This value should be larger than job_worker.shutdown_timeout to allow proper cascading. If the supervisor timeout is too short, workers won’t have time to finish their shutdown sequence.
Recommended values:
job_worker.shutdown_timeout: 20 secondssupervisor.shutdown_timeout: 30 seconds (gives 10 seconds buffer)
Signal Handling
Section titled “Signal Handling”The supervisor responds to Unix signals for control and debugging:
TERM and INT - Graceful Shutdown
Section titled “TERM and INT - Graceful Shutdown”Triggers the graceful shutdown sequence:
- Supervisor forwards signal to all child processes
- Child processes begin their shutdown sequences
- Supervisor waits up to
supervisor.shutdown_timeoutseconds - Processes still alive after timeout are killed with
SIGKILL - Supervisor exits
# Send TERM signalkill -TERM <supervisor_pid>
# Or INT signal (both behave identically)kill -INT <supervisor_pid>See Signal Handling for detailed shutdown behavior.
TTIN - Thread Backtrace Dump
Section titled “TTIN - Thread Backtrace Dump”Requests thread backtraces from all child processes for debugging hung or slow processes.
kill -TTIN <supervisor_pid>The supervisor forwards this signal to all children, which dump their thread backtraces to the configured logger. This is invaluable for diagnosing performance issues or deadlocks in production.
See TTIN Signal Handling for details.
Lifecycle Hooks
Section titled “Lifecycle Hooks”Register actions to run when the supervisor starts or stops:
Ductwork.on_supervisor_start do Rails.logger.info "Ductwork supervisor starting" # Initialize monitoring, notify deployment tracking, etc.end
Ductwork.on_supervisor_stop do Rails.logger.info "Ductwork supervisor shutting down" # Flush metrics, notify monitoring systems, etc.endThese hooks run once per supervisor lifecycle—at the very beginning of startup and the very end of shutdown. Use them for initialization, cleanup, or integration with external systems.
See Lifecycle Hooks for all available hooks.
Monitoring
Section titled “Monitoring”Track supervisor health and behavior by monitoring:
Process Metrics
Section titled “Process Metrics”- Supervisor uptime
- Number of child process restarts
- Child process spawn rate
- Failed startup attempts
Resource Usage
Section titled “Resource Usage”- Supervisor CPU and memory usage
- Total memory across all child processes
- Open file descriptors
- Database connection count
Heartbeat Status
Section titled “Heartbeat Status”- Time since last heartbeat from each child
- Heartbeat check frequency
- Missed heartbeat count
Shutdown Behavior
Section titled “Shutdown Behavior”- Time to complete graceful shutdown
- Number of processes killed after timeout
- Shutdown success rate
Running Multiple Supervisors
Section titled “Running Multiple Supervisors”You can run multiple bin/ductwork instances to isolate pipelines or scale horizontally:
Isolate Critical Pipelines
Section titled “Isolate Critical Pipelines”# Critical pipelines with dedicated resourcesbin/ductwork -c config/ductwork.critical.yml
# Background pipelines on separate instancebin/ductwork -c config/ductwork.background.ymlproduction: pipelines: - ProcessPaymentsPipeline - SendNotificationsPipeline job_worker: worker_count: 20
# config/ductwork.background.ymlproduction: pipelines: - GenerateReportsPipeline - CleanupDataPipeline job_worker: worker_count: 5Scale Across Machines
Section titled “Scale Across Machines”Run separate supervisors on different servers for horizontal scaling:
# Server 1 - Handle user-facing pipelinesbin/ductwork -c config/ductwork.user_facing.yml
# Server 2 - Handle batch processing pipelinesbin/ductwork -c config/ductwork.batch.ymlBenefits:
- Fault isolation (one failing pipeline doesn’t affect others)
- Resource allocation (dedicate CPU/memory to specific pipelines)
- Independent scaling (scale critical pipelines without scaling everything)
- Deployment flexibility (deploy changes to specific pipeline groups)
Considerations:
- More operational complexity
- Higher total resource usage (overhead per supervisor)
- Need coordination for monitoring across instances
Process Management
Section titled “Process Management”Integrate Ductwork with your process manager:
systemd
Section titled “systemd”[Unit]Description=Ductwork Pipeline SupervisorAfter=network.target postgresql.service
[Service]Type=simpleUser=deployWorkingDirectory=/var/www/myappExecStart=/var/www/myapp/bin/ductwork -c config/ductwork.ymlRestart=alwaysRestartSec=10
[Install]WantedBy=multi-user.targetDocker
Section titled “Docker”# DockerfileCMD ["bin/ductwork", "-c", "config/ductwork.production.yml"]Kubernetes
Section titled “Kubernetes”apiVersion: apps/v1kind: Deploymentmetadata: name: ductworkspec: replicas: 2 template: spec: containers: - name: ductwork image: myapp:latest command: ["bin/ductwork"] args: ["-c", "config/ductwork.yml"]The supervisor’s resilient design makes it suitable for containerized environments and orchestration platforms.