Skip to content

Supervisor

The supervisor is the orchestrator of your Ductwork infrastructure. As the parent process for each bin/ductwork instance, it launches and monitors all child processes, ensuring your pipelines continue running even when individual processes fail.

When you run bin/ductwork, you start a supervisor process that:

  • Launches one pipeline advancer process
  • Launches one job worker process for each configured pipeline
  • Monitors all child processes through heartbeat checks
  • Automatically restarts failed or hung processes
  • Coordinates graceful shutdown across all processes

The supervisor acts as the resilience layer, making Ductwork pipelines fault-tolerant without manual intervention.

Each Ductwork instance creates the following process tree:

supervisor (bin/ductwork)
├── pipeline advancer
│ └── thread for Pipeline A
│ └── thread for Pipeline B
│ └── thread for Pipeline C
├── job worker (Pipeline A)
│ └── worker thread 1
│ └── worker thread 2
│ └── ...
├── job worker (Pipeline B)
│ └── worker thread 1
│ └── worker thread 2
│ └── ...
└── job worker (Pipeline C)
└── worker thread 1
└── worker thread 2
└── ...

The supervisor manages the complete lifecycle of child processes:

Startup:

  1. Read configuration from YAML file
  2. Fork the pipeline advancer process
  3. Fork one job worker process per configured pipeline
  4. Register signal handlers for graceful shutdown
  5. Enter monitoring loop

Monitoring:

  • Check heartbeats from each child process
  • Track process health and uptime
  • Detect crashes or hangs
  • Log process status changes

Recovery:

  • Automatically restart failed processes
  • Maintain pipeline availability during failures
  • Preserve pipeline state through process restarts

Shutdown:

  • Forward shutdown signals to all children
  • Wait for graceful shutdown with timeout
  • Terminate unresponsive processes
  • Clean up resources and exit

The supervisor continuously monitors child process health through periodic heartbeats. Each child process reports its status at regular intervals, confirming it’s alive and processing work.

Detection: If a child process fails to report a heartbeat within 5 minutes—indicating a crash, hang, or deadlock—the supervisor detects the failure.

Recovery: The supervisor immediately spawns a replacement process to restore full pipeline capacity. The new process picks up where the previous one left off, resuming work on pending jobs.

Why 5 minutes? This timeout balances quick failure detection with tolerance for legitimately slow operations. Steps should typically complete in seconds, but this buffer accounts for temporarily degraded performance without false positives.

The supervisor’s behavior is controlled through config/ductwork.yml:

Specifies which pipelines to run. The supervisor creates child processes based on this configuration.

default: &default
pipelines:
- EnrichUserDataPipeline
- ProcessOrdersPipeline

Or use the wildcard to run all defined pipelines:

default: &default
pipelines: "*"

Note: The supervisor creates one advancer and one job worker per pipeline listed here.

How long (in seconds) the supervisor sleeps between heartbeat checks.

Default: 1 second

default: &default
supervisor:
polling_timeout: 5

Tuning: Shorter intervals provide faster failure detection but increase CPU usage. Longer intervals reduce overhead but delay failure detection. The default (1 second) works well for most applications.

Maximum time (in seconds) to wait for child processes to shut down gracefully. After this timeout, remaining processes receive SIGKILL and terminate immediately.

Default: 30 seconds

default: &default
supervisor:
shutdown_timeout: 45

Important: This value should be larger than job_worker.shutdown_timeout to allow proper cascading. If the supervisor timeout is too short, workers won’t have time to finish their shutdown sequence.

Recommended values:

  • job_worker.shutdown_timeout: 20 seconds
  • supervisor.shutdown_timeout: 30 seconds (gives 10 seconds buffer)

The supervisor responds to Unix signals for control and debugging:

Triggers the graceful shutdown sequence:

  1. Supervisor forwards signal to all child processes
  2. Child processes begin their shutdown sequences
  3. Supervisor waits up to supervisor.shutdown_timeout seconds
  4. Processes still alive after timeout are killed with SIGKILL
  5. Supervisor exits
Terminal window
# Send TERM signal
kill -TERM <supervisor_pid>
# Or INT signal (both behave identically)
kill -INT <supervisor_pid>

See Signal Handling for detailed shutdown behavior.

Requests thread backtraces from all child processes for debugging hung or slow processes.

Terminal window
kill -TTIN <supervisor_pid>

The supervisor forwards this signal to all children, which dump their thread backtraces to the configured logger. This is invaluable for diagnosing performance issues or deadlocks in production.

See TTIN Signal Handling for details.

Register actions to run when the supervisor starts or stops:

config/initializers/ductwork.rb
Ductwork.on_supervisor_start do
Rails.logger.info "Ductwork supervisor starting"
# Initialize monitoring, notify deployment tracking, etc.
end
Ductwork.on_supervisor_stop do
Rails.logger.info "Ductwork supervisor shutting down"
# Flush metrics, notify monitoring systems, etc.
end

These hooks run once per supervisor lifecycle—at the very beginning of startup and the very end of shutdown. Use them for initialization, cleanup, or integration with external systems.

See Lifecycle Hooks for all available hooks.

Track supervisor health and behavior by monitoring:

  • Supervisor uptime
  • Number of child process restarts
  • Child process spawn rate
  • Failed startup attempts
  • Supervisor CPU and memory usage
  • Total memory across all child processes
  • Open file descriptors
  • Database connection count
  • Time since last heartbeat from each child
  • Heartbeat check frequency
  • Missed heartbeat count
  • Time to complete graceful shutdown
  • Number of processes killed after timeout
  • Shutdown success rate

You can run multiple bin/ductwork instances to isolate pipelines or scale horizontally:

Terminal window
# Critical pipelines with dedicated resources
bin/ductwork -c config/ductwork.critical.yml
# Background pipelines on separate instance
bin/ductwork -c config/ductwork.background.yml
config/ductwork.critical.yml
production:
pipelines:
- ProcessPaymentsPipeline
- SendNotificationsPipeline
job_worker:
worker_count: 20
# config/ductwork.background.yml
production:
pipelines:
- GenerateReportsPipeline
- CleanupDataPipeline
job_worker:
worker_count: 5

Run separate supervisors on different servers for horizontal scaling:

Terminal window
# Server 1 - Handle user-facing pipelines
bin/ductwork -c config/ductwork.user_facing.yml
# Server 2 - Handle batch processing pipelines
bin/ductwork -c config/ductwork.batch.yml

Benefits:

  • Fault isolation (one failing pipeline doesn’t affect others)
  • Resource allocation (dedicate CPU/memory to specific pipelines)
  • Independent scaling (scale critical pipelines without scaling everything)
  • Deployment flexibility (deploy changes to specific pipeline groups)

Considerations:

  • More operational complexity
  • Higher total resource usage (overhead per supervisor)
  • Need coordination for monitoring across instances

Integrate Ductwork with your process manager:

/etc/systemd/system/ductwork.service
[Unit]
Description=Ductwork Pipeline Supervisor
After=network.target postgresql.service
[Service]
Type=simple
User=deploy
WorkingDirectory=/var/www/myapp
ExecStart=/var/www/myapp/bin/ductwork -c config/ductwork.yml
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
# Dockerfile
CMD ["bin/ductwork", "-c", "config/ductwork.production.yml"]
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ductwork
spec:
replicas: 2
template:
spec:
containers:
- name: ductwork
image: myapp:latest
command: ["bin/ductwork"]
args: ["-c", "config/ductwork.yml"]

The supervisor’s resilient design makes it suitable for containerized environments and orchestration platforms.