stackman - Docker Swarm stack orchestrator with health-aware deployment, intelligent rollback, and Helm-like workflow for Docker Swarm.
A CLI tool written in Go that brings zero-downtime deployments, real-time health monitoring, and automatic
rollback to Docker Swarm, filling the gap between basic docker stack deploy and enterprise-grade deployment
automation.
Docker Swarm's docker stack deploy has a critical limitation: it returns immediately after submitting the
deployment request, without waiting for services to actually start, become healthy, or validate successful deployment.
This creates production risks:
| Problem | Impact | Example |
|---|---|---|
| No validation | Deployments appear successful even when they fail | CI/CD marks green, but service crashes |
| No health awareness | Broken services go unnoticed until user reports | Database migration fails, app starts anyway |
| Manual rollback | No automatic recovery from bad deployments | 3 AM page, manual investigation required |
| Silent failures | Task failures, health check failures ignored | Service dies repeatedly, no alerts |
| No deployment tracking | Can't tell when deployment actually completes | Is it done? Is it healthy? Unknown. |
stackman wraps Docker Swarm API with deployment intelligence:
β Waits for deployment - Monitors service updates until all tasks are running β Health validation - Ensures all containers pass health checks before success β Automatic rollback - Reverts to previous state on failure or timeout β Real-time visibility - Streams logs, events, and health status during deployment β Production-ready - Signal handling, proper exit codes, CI/CD integration β Task tracking - Monitors old task shutdown and new task startup with version control
stackman follows a deployment lifecycle pattern with automatic safety mechanisms:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. PARSE & VALIDATE β
β β’ Parse docker-compose.yml β
β β’ Validate image tags (no :latest without --allow-latest) β
β β’ Convert compose spec to Swarm ServiceSpec β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2. SNAPSHOT (Rollback Preparation) β
β β’ Capture current service specs (ServiceInspect) β
β β’ Store service versions and task states β
β β’ Record resources (networks, volumes, secrets, configs) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 3. DEPLOYMENT β
β β’ Remove obsolete services (--prune) β
β β’ Pull images with progress tracking β
β β’ Create/update networks (overlay) β
β β’ Create/update volumes (local) β
β β’ Deploy services with unique DeployID label β
β β’ Track service version changes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 4. HEALTH MONITORING (unless --no-wait) β
β β’ Subscribe to Docker events (task lifecycle) β
β β’ Start per-task monitors with log streaming β
β β’ Track UpdateStatus.State β "completed" β
β β’ Poll container health status (State.Health) β
β β’ Wait for all tasks: Running + Healthy β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββ
β All healthy? β
βββββββββββββββββββ
β β
YES NO/TIMEOUT/SIGINT
β β
ββββββββββββββββ ββββββββββββββββββββ
β β
SUCCESS β β β οΈ ROLLBACK β
β Exit 0 β β β’ Restore specs β
ββββββββββββββββ β β’ Revert versionsβ
β β’ Wait healthy β
β Exit 1/2/130 β
ββββββββββββββββββββ
- Compose parsing - YAML β internal model (no external compose libraries)
- Path resolution - Converts relative paths to absolute using
STACKMAN_WORKDIR - Validation - Checks
:latesttag protection, required fields - Templating - Applies
${VAR}environment variable substitution
- ServiceInspect - Captures current
SpecandVersion.Indexfor each service - Resource inventory - Records existing networks, volumes, secrets, configs
- Rollback readiness - Ensures we can restore previous state on failure
- Image Pull - Pre-pulls all images (respects
DOCKER_CONFIG_PATHfor auth) - Resource Creation - Networks β Volumes β Secrets β Configs (dependency order)
- Service Update - Uses
ServiceUpdateAPI with currentVersion.Index - DeployID Injection - Adds
com.stackman.deploy.idlabel to all tasks for tracking
- Event Subscription - Listens to
type=taskevents filtered by stack namespace - Task Watchers - Spawns goroutine per task for log streaming and container inspection
- UpdateStatus Tracking - Waits for
UpdateStatus.State == "completed" - Health Polling - Periodic
ContainerInspectchecksState.Health.Status == "healthy" - DeployID Filtering - Only monitors tasks with matching
com.stackman.deploy.idlabel
- Trigger Conditions: Health timeout, task failures, SIGINT/SIGTERM
- Restoration: Applies previous
ServiceSpecusing savedVersion.Index - Health Re-check: Waits for rolled-back services to become healthy (with
--rollback-timeout)
- β Health-aware deployment - Waits for all services to become healthy before declaring success
- β Automatic rollback - Reverts to previous state on failure, timeout, or interrupt (SIGINT/SIGTERM)
- β
Version-aware updates - Tracks service
Version.Indexto prevent update conflicts - β DeployID tracking - Injects unique deployment ID into all tasks for precise monitoring
- β Image pre-pull - Pulls images before deployment with progress tracking
- β Dependency-ordered deployment - Creates resources in correct order (networks β volumes β secrets β services)
- β
Event-driven architecture - Subscribes to Docker events (
type=task) for instant task lifecycle updates - β Per-task monitoring - Spawns dedicated watcher goroutine for each task with log streaming
- β
UpdateStatus tracking - Monitors
ServiceInspectWithRawβUpdateStatus.State == "completed" - β
Health polling - Periodic
ContainerInspectchecksState.Health.Status - β Failed task detection - Reports task failures with exit codes and error messages
- β No healthcheck tolerance - Services without healthchecks are considered healthy if running
- β
Custom YAML parser - No external compose libraries (only
gopkg.in/yaml.v3) - β
Path resolution - Converts relative paths (
./data) to absolute usingSTACKMAN_WORKDIRor CWD - β
Environment substitution - Supports
${VAR}syntax for environment variables - β
Full Swarm spec mapping - Converts
deploy.replicas,deploy.update_config,deploy.placement, etc. - β Resource support - Networks (overlay), Volumes (local), Secrets, Configs (parsing implemented)
- β
Healthcheck conversion - Maps
healthchecktoContainerSpec.Healthcheck
- β
Snapshot-based rollback - Captures
ServiceInspectbefore deployment for safe revert - β Signal handling - Intercepts SIGINT/SIGTERM β triggers rollback β exits with code 130
- β
Timeout protection -
--timeoutfor deployment,--rollback-timeoutfor rollback - β
Image tag validation - Blocks
:latesttag unless--allow-latestis set - β Idempotency - Repeated applies without changes result in no-op
- β Concurrent-safe - Handles multiple goroutines for task monitoring with mutexes
- β
Multiple subcommands -
apply,rollback,diff,status,logs,events - β CI/CD friendly - Proper exit codes (0=success, 1=failure, 2=timeout, 130=interrupted)
- β
TLS support - Respects
DOCKER_HOST,DOCKER_TLS_VERIFY,DOCKER_CERT_PATH - β
Registry authentication - Uses
DOCKER_CONFIG_PATHfor private registry auth (config.json) - β
Parallel updates -
--parallelflag for concurrent service updates (not yet fully implemented) - β
No external dependencies - Only uses:
github.com/docker/docker,github.com/docker/go-units,golang.org/x/net,gopkg.in/yaml.v3
# Linux (amd64)
wget https://github.com/SomeBlackMagic/stackman/releases/latest/download/stackman-linux-amd64
chmod +x stackman-linux-amd64
sudo mv stackman-linux-amd64 /usr/local/bin/stackman
# macOS (amd64)
curl -L https://github.com/SomeBlackMagic/stackman/releases/latest/download/stackman-darwin-amd64 -o stackman
chmod +x stackman
sudo mv stackman /usr/local/bin/stackmangit clone https://github.com/SomeBlackMagic/stackman.git
cd stackman
go build -o stackman .
sudo mv stackman /usr/local/bin/stackmanmake build # Build binary
make install # Install to /usr/local/binstackman versionstackman <command> [flags]| Command | Description | Status |
|---|---|---|
apply |
Deploy or update a stack | β Implemented |
rollback |
Rollback stack to previous state | π§ Stub |
diff |
Show deployment plan without applying | π§ Stub |
status |
Show current stack status | π§ Stub |
logs |
Show logs for stack services | π§ Stub |
events |
Show events for stack services | π§ Stub |
version |
Show version information | β Implemented |
Deploy or update a Docker Swarm stack with health monitoring and automatic rollback.
stackman apply -n <stack-name> -f <compose-file> [flags]| Flag | Type | Default | Description |
|---|---|---|---|
-n, --name |
string | (required) | Stack name |
-f, --file |
string | (required) | Path to docker-compose.yml |
--values |
string | - | Values file for templating (not yet implemented) |
--set |
string | - | Set values (key=value pairs, not yet implemented) |
--timeout |
duration | 15m |
Deployment health check timeout |
--rollback-timeout |
duration | 10m |
Rollback timeout |
--no-wait |
bool | false |
Don't wait for health checks |
--prune |
bool | false |
Remove orphaned services |
--allow-latest |
bool | false |
Allow :latest image tags |
--parallel |
int | 1 |
Parallel service updates (not yet implemented) |
--logs |
bool | true |
Stream container logs during deployment |
stackman apply -n mystack -f docker-compose.ymlstackman apply -n mystack -f docker-compose.yml --timeout 20m --rollback-timeout 5mstackman apply -n mystack -f docker-compose.yml --no-waitstackman apply -n mystack -f docker-compose.yml --prunestackman apply -n mystack -f docker-compose.yml --allow-lateststackman apply -n mystack -f docker-compose.yml --logs=falseexport DOCKER_HOST=tcp://192.168.1.100:2376
export DOCKER_TLS_VERIFY=1
export DOCKER_CERT_PATH=/path/to/certs
stackman apply -n mystack -f docker-compose.yml# Ensure $HOME/.docker/config.json contains auth credentials
# Or set custom path:
export DOCKER_CONFIG_PATH=/etc/docker
stackman apply -n mystack -f docker-compose.ymlstackman reads configuration from environment variables:
| Variable | Description | Default | Example |
|---|---|---|---|
DOCKER_HOST |
Docker daemon socket | unix:///var/run/docker.sock |
tcp://192.168.1.100:2376 |
DOCKER_TLS_VERIFY |
Enable TLS verification | 0 |
1 |
DOCKER_CERT_PATH |
Path to TLS certificates | - | /etc/docker/certs |
DOCKER_CONFIG_PATH |
Path to Docker config directory (for registry auth) | $HOME/.docker |
/etc/docker |
| Variable | Description | Default | Example |
|---|---|---|---|
STACKMAN_WORKDIR |
Base path for relative volume mounts | Current working directory | /var/app/stacks/production |
STACKMAN_DEPLOY_TIMEOUT |
Deployment timeout (overridden by --timeout flag) |
15m |
20m |
STACKMAN_ROLLBACK_TIMEOUT |
Rollback timeout (overridden by --rollback-timeout flag) |
10m |
5m |
| Variable | Description | Default | Example |
|---|---|---|---|
LOG_LEVEL |
Log verbosity (not yet implemented) | info |
debug |
NO_COLOR |
Disable colored output (not yet implemented) | false |
true |
Priority (highest to lowest):
- Command-line flags (e.g.,
--timeout 20m) - Environment variables (e.g.,
STACKMAN_DEPLOY_TIMEOUT=20m) - Default values (e.g.,
15mfor timeout)
stackman follows standard Unix exit code conventions for CI/CD integration:
| Code | Meaning | Trigger Condition | Rollback Performed? |
|---|---|---|---|
| 0 | Success | All services deployed and healthy | N/A |
| 1 | Failure | Deployment failed (parse error, API error, validation failed) | β Yes (if deployment started) |
| 2 | Timeout | Health check timeout reached | β Yes |
| 3 | Rollback Failed | Deployment failed AND rollback also failed (as per spec) | |
| 4 | Connection Error | Docker API/Registry connection failed (as per spec) | N/A |
| 130 | Interrupted | User pressed Ctrl+C (SIGINT) or SIGTERM received | β Yes |
# GitLab CI / GitHub Actions example
stackman apply -n production -f docker-compose.yml
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "β
Deployment successful"
elif [ $EXIT_CODE -eq 1 ]; then
echo "β Deployment failed, rollback succeeded"
exit 1
elif [ $EXIT_CODE -eq 2 ]; then
echo "β±οΈ Deployment timeout, rollback succeeded"
exit 1
elif [ $EXIT_CODE -eq 130 ]; then
echo "π Deployment interrupted, rollback succeeded"
exit 1
else
echo "π₯ Critical error (code $EXIT_CODE)"
exit $EXIT_CODE
fi| Requirement | Version | Purpose |
|---|---|---|
| Docker Engine | 19.03+ | Swarm API access |
| Docker Swarm | Initialized | docker swarm init |
| Operating System | Linux / macOS / Windows | Cross-platform |
| Architecture | amd64 / arm64 | Binary architecture |
| Requirement | Version | Purpose |
|---|---|---|
| Go | 1.24+ | Compiler toolchain |
| Make | (optional) | Build automation |
β Supported: Compose file format version 3.x (Swarm mode) β Not Supported: Compose file version 2.x (standalone Docker)
- β
Recommended: Define
healthcheckfor all services for accurate deployment validation β οΈ Optional: Services without healthcheck are considered healthy if task is inrunningstate- π Best Practice: Use fast healthchecks (
interval: 5-10s) with reasonablestart_period
version: '3.8'
services:
web:
image: nginx:1.25-alpine
healthcheck:
test: [ "CMD", "wget", "-q", "--spider", "http://localhost" ]
interval: 10s
timeout: 3s
retries: 3
start_period: 5s
deploy:
replicas: 2
update_config:
parallelism: 1
delay: 10s
networks:
default:
driver: overlayBefore using stackman, ensure:
# 1. Docker is running
docker info
# 2. Swarm is initialized
docker swarm init
# 3. You are a swarm manager node
docker node ls
# 4. Your compose file is valid
docker-compose -f docker-compose.yml config
# 5. Required images are pullable (for private registries)
docker login registry.example.com2025/11/06 14:33:05 Start Docker Stack Wait version=1.0.0 revision=abc123
2025/11/06 14:33:05 Parsing compose file: docker-compose.yml
2025/11/06 14:33:05 Creating snapshot of current stack state...
2025/11/06 14:33:06 Snapshotted service: mystack_web (version 42)
2025/11/06 14:33:06 Snapshotted service: mystack_api (version 38)
2025/11/06 14:33:06 Snapshot created with 2 services
2025/11/06 14:33:06 Starting deployment of stack: mystack
2025/11/06 14:33:06 No obsolete services to remove
2025/11/06 14:33:06 Pulling image: nginx:latest
2025/11/06 14:33:08 Image nginx:latest pulled successfully
2025/11/06 14:33:08 Network mystack_default already exists
2025/11/06 14:33:08 Updating service: mystack_web
2025/11/06 14:33:09 Service mystack_web updated, waiting for tasks to be recreated...
[event:service:mystack_web] update
[event:container:mystack_web.1.xyz] start
Stack deployed successfully. Starting health checks...
2025/11/06 14:33:12 Starting log streaming for 2 services...
Waiting for service tasks to start...
2025/11/06 14:33:15 Waiting for services to become healthy...
2025/11/06 14:33:20 Container statuses: mystack_web.1: running/starting, mystack_api.1: running/healthy
[event:container:mystack_web.1.xyz] healthcheck passed (exit 0): curl -f http://localhost
2025/11/06 14:33:25 Container statuses: mystack_web.1: running/healthy, mystack_api.1: running/healthy
2025/11/06 14:33:25 All containers are healthy (checked 2 containers)
All containers healthy.
2025/11/06 14:45:10 Start Docker Stack Wait version=1.0.0 revision=abc123
2025/11/06 14:45:10 Parsing compose file: docker-compose.yml
2025/11/06 14:45:10 Creating snapshot of current stack state...
2025/11/06 14:45:11 Starting deployment of stack: mystack
2025/11/06 14:45:11 Pulling image: myapp:broken-version
2025/11/06 14:45:15 Updating service: mystack_api
[event:service:mystack_api] update
[event:container:mystack_api.1.abc] start
[event:container:mystack_api.1.abc] healthcheck failed (exit 1): curl -f http://localhost/health
2025/11/06 14:45:30 ERROR: Service mystack_api task abc123def456 failed with state shutdown (desired: shutdown)
2025/11/06 14:45:30 ERROR: New task abc123def456 failed with state complete (desired: shutdown): task: non-zero exit (1)
Container exit code: 1
Task was shutdown and replaced (likely healthcheck failure)
ERROR: Services failed healthcheck or didn't start in time.
Starting rollback to previous state...
2025/11/06 14:45:31 Rolling back stack: mystack
2025/11/06 14:45:31 Rolling back service: mystack_api to version 38
2025/11/06 14:45:32 Service mystack_api rolled back successfully
2025/11/06 14:45:32 Rollback completed for stack: mystack
Rollback completed successfully
2025/11/06 14:50:15 Start Docker Stack Wait version=1.0.0 revision=abc123
2025/11/06 14:50:15 Deploying stack: mystack
[event:service:mystack_web] update
^C
2025/11/06 14:50:20 Received signal: interrupt
2025/11/06 14:50:20 Deployment interrupted, initiating rollback...
Starting rollback to previous state...
2025/11/06 14:50:21 Rolling back stack: mystack
2025/11/06 14:50:22 Service mystack_web rolled back successfully
Rollback completed successfully
stackman follows a clean, modular architecture aligned with the technical specification:
stackman/
βββ main.go # Entry point and CLI orchestration
βββ cmd/ # CLI commands (cobra-like structure)
β βββ root.go # Command router and usage
β βββ apply.go # apply command (β
IMPLEMENTED)
β βββ rollback.go # rollback command (π§ stub)
β βββ logs.go # logs command (π§ stub)
β βββ events.go # events command (π§ stub)
β βββ stubs.go # Stub implementations for incomplete commands
β βββ version.go # version command (β
IMPLEMENTED)
βββ internal/ # Internal packages (not importable externally)
β βββ compose/ # β
Compose file parsing (no external libs)
β β βββ types.go # Compose spec types (services, networks, volumes, etc.)
β β βββ parser.go # YAML β ComposeSpec parser (gopkg.in/yaml.v3)
β β βββ converter.go # ComposeSpec β Swarm ServiceSpec converter
β βββ swarm/ # β
Docker Swarm API client wrapper
β β βββ interface.go # StackDeployer interface
β β βββ stack.go # Stack deployment orchestration
β β βββ services.go # Service create/update logic
β β βββ images.go # Image pull with progress tracking
β β βββ networks.go # Network create/inspect logic
β β βββ volumes.go # Volume create/inspect logic
β β βββ cleanup.go # Obsolete service removal
β β βββ rollback.go # Rollback execution
β β βββ state.go # Current swarm state reading
β βββ health/ # β
Health monitoring and event handling
β β βββ watcher.go # Event-driven task watcher (Docker events API)
β β βββ monitor.go # Per-task monitor with log streaming
β β βββ events.go # Event types and subscription
β β βββ service_update_monitor.go # ServiceInspect UpdateStatus tracker
β βββ snapshot/ # β
Snapshot creation and restoration
β β βββ snapshot.go # ServiceInspect capture and rollback
β βββ deployment/ # β
Deployment ID generation
β β βββ id.go # Unique deployment ID (timestamp-based)
β βββ paths/ # β
Path resolution logic
β β βββ resolver.go # STACKMAN_WORKDIR + relative β absolute conversion
β βββ plan/ # π§ Diff and deployment plan (partially implemented)
β β βββ types.go # Plan types (Create/Update/Delete)
β β βββ planner.go # Diff logic (current vs desired state)
β β βββ formatter.go # Plan output formatting
β βββ apply/ # π Apply orchestration (currently in cmd/apply.go)
β βββ rollback/ # π Rollback orchestration (currently in snapshot/)
β βββ signals/ # π SIGINT/SIGTERM handling (currently in cmd/apply.go)
β βββ output/ # π Structured output and formatting
βββ tests/ # β
Integration tests
β βββ apply_test.go # Full deployment cycle tests
β βββ health_test.go # Health check monitoring tests
β βββ resources_test.go # Networks, volumes, secrets, configs tests
β βββ operations_test.go # Prune, signals tests
β βββ negative_test.go # Error handling tests
β βββ helpers_test.go # Test utilities
β βββ testdata/ # Test compose files
βββ docs/ # Documentation
βββ stackman-diff-techspec-full.md # Full technical specification
βββ TEST_RESULTS.md # Test validation results
βββ TESTING.md # Testing guide
| Package | Responsibility | Status |
|---|---|---|
cmd/ |
CLI interface, argument parsing, command routing | β Core commands implemented |
internal/compose/ |
Parse docker-compose.yml β internal model |
β Fully implemented |
internal/swarm/ |
Docker Swarm API operations (services, tasks, networks, volumes) | β Core operations implemented |
internal/health/ |
Event-driven task monitoring, health checks, log streaming | β Fully implemented |
internal/snapshot/ |
Capture and restore service state for rollback | β Implemented |
internal/deployment/ |
Generate unique deployment IDs for task tracking | β Implemented |
internal/paths/ |
Resolve relative paths to absolute using STACKMAN_WORKDIR |
β Implemented |
internal/plan/ |
Diff current vs desired state, generate deployment plan | π§ Partially implemented |
internal/apply/ |
High-level apply orchestration | π To be extracted from cmd/apply.go |
internal/rollback/ |
High-level rollback orchestration | π To be extracted from snapshot/ |
internal/signals/ |
SIGINT/SIGTERM handling with context propagation | π To be extracted from cmd/apply.go |
internal/output/ |
Structured logging, JSON output, progress formatting | π Planned |
stackman includes a comprehensive Docker Compose parser that converts docker-compose.yml files to Docker Swarm
service specifications.
- Images & Build:
image,build(context, dockerfile, args, target, cache_from) - Commands:
command,entrypoint - Environment:
environment(array and map formats),env_file - Container Settings:
hostname,domainname,user,working_dir,stdin_open,tty,read_only,init - Lifecycle:
stop_signal,stop_grace_period,restart
- Ports: Short syntax (
"8080:80") and long syntax (with mode and protocol) - Networks: Network attachment with aliases
- DNS:
dns,dns_search,dns_opt - Hosts:
extra_hosts,mac_address
- Volumes: Bind mounts with automatic relative β absolute path conversion
- Named Volumes: Volume references from top-level
volumes:section - Tmpfs: Temporary filesystem mounts
- Test Commands: CMD-SHELL and exec array formats
- Timing:
interval,timeout,retries,start_period - Control:
disableflag
- Mode:
replicated(with replica count) orglobal - Updates: Parallelism, delay, order, failure action, monitor period, max failure ratio
- Rollback: Same configuration as updates
- Resources: CPU and memory limits/reservations
- Restart Policy: Condition, delay, max attempts, window
- Placement: Node constraints, spread preferences, max replicas per node
- Capabilities:
cap_add,cap_drop - Devices: Device mappings
- Isolation: Container isolation technology
- Services: Complete service definitions
- Networks: Custom networks with driver options, IPAM config
- Volumes: Named volumes with driver options
- Secrets: File or external secrets (parsed, creation not implemented)
- Configs: File or external configs (parsed, creation not implemented)
Some Docker Compose fields are parsed but not applied due to Docker Swarm API restrictions:
| Field | Reason |
|---|---|
privileged |
Not supported in Swarm mode |
security_opt |
Not available in Swarm ContainerSpec |
sysctls |
Not available in Swarm ContainerSpec |
ulimits |
Not available in Swarm ContainerSpec |
links, external_links |
Deprecated in favor of networks |
depends_on |
No start order control in Swarm |
These fields remain in the type definitions for completeness and potential future use.
# GitLab CI example
deploy:
stage: deploy
script:
- stackman production docker-compose.yml 10 5
only:
- main# Deploy to green environment
stackman green-stack docker-compose.yml
# If successful, switch traffic and deploy to blue
stackman blue-stack docker-compose.yml# docker-compose.yml
services:
web:
image: myapp:${VERSION}
deploy:
replicas: 1 # Start with 1 replica
update_config:
parallelism: 1
delay: 30s# Deploy canary
VERSION=v2.0 stackman mystack docker-compose.yml
# If healthy, scale up
docker service scale mystack_web=10#!/bin/bash
# deploy.sh
ENVIRONMENTS=("dev" "staging" "production")
COMPOSE_FILE="docker-compose.yml"
for ENV in "${ENVIRONMENTS[@]}"; do
echo "Deploying to $ENV..."
if stackman "${ENV}-stack" "$COMPOSE_FILE" 15 3; then
echo "β $ENV deployment successful"
else
echo "β $ENV deployment failed, stopping"
exit 1
fi
doneSymptoms:
[HealthCheck] β³ Task abc123 (mystack_web) is starting
ERROR: timeout after 15m waiting for services to become healthy
Root Causes:
- Health check
start_periodtoo short for slow-starting apps - Health check command timing out or failing
- Insufficient resources (CPU/memory) causing slow startup
Solutions:
-
Increase deployment timeout:
stackman apply -n mystack -f docker-compose.yml --timeout 30m
-
Adjust healthcheck in compose file:
healthcheck: test: ["CMD", "curl", "-f", "http://localhost/health"] interval: 10s timeout: 5s retries: 3 start_period: 60s # Increase if app needs more startup time
-
Test healthcheck command manually:
docker exec <container-id> curl -f http://localhost/health
Symptoms:
[ServiceMonitor] β Service mystack_api: Task xyz789 failed - task: non-zero exit (1)
ERROR: New task failed with state complete (desired: shutdown): task: non-zero exit (1)
Root Causes:
- Application crash on startup
- Health check failing repeatedly
- Missing environment variables or secrets
- Configuration errors
Solutions:
-
Check service logs:
docker service logs mystack_servicename --tail 100
-
Inspect task details:
docker service ps mystack_servicename --no-trunc
-
Test container locally (before swarm deployment):
docker run --rm myimage:tag
-
Check for missing config/secrets:
docker config ls docker secret ls
Symptoms:
Starting rollback to previous state...
[ServiceMonitor] β Service mystack_web: Task def456 failed
ERROR: Rollback failed: services did not become healthy after rollback
Root Cause: Previous version has underlying health issues (database schema mismatch, missing dependencies, etc.)
Solutions:
-
Deploy a known-good version (not rollback):
# Use older compose file with working version stackman apply -n mystack -f docker-compose.v1.2.3.yml -
Fix health check in previous version:
# Temporarily disable strict health checks healthcheck: test: ["CMD", "true"] # Always passes
-
Manual intervention required:
# Remove stack completely docker stack rm mystack # Wait for cleanup sleep 30 # Deploy known-good version stackman apply -n mystack -f docker-compose.good.yml
Symptoms:
ERROR: No services found for stack: mystack
Root Causes:
- Stack name mismatch
- Swarm mode not initialized
- Wrong Docker daemon context
Solutions:
-
Verify swarm is initialized:
docker info | grep Swarm # Should show: "Swarm: active" # If not active: docker swarm init
-
List existing stacks:
docker stack ls
-
Check Docker context:
docker context ls docker context use default
Symptoms:
ERROR: failed to pull image registry.example.com/myapp:latest: unauthorized
Solutions:
-
Login to registry:
docker login registry.example.com
-
Set Docker config path:
export DOCKER_CONFIG_PATH=$HOME/.docker stackman apply -n mystack -f docker-compose.yml
-
Verify auth config:
cat $HOME/.docker/config.json # Should contain "auths": { "registry.example.com": { "auth": "..." } }
Symptoms:
[HealthCheck] β³ Task ghi012 (mystack_db) is preparing
(repeats indefinitely)
Root Causes:
- Image pull in progress (large images)
- Node resource constraints
- Network issues
Solutions:
-
Check task status:
docker service ps mystack_servicename --no-trunc
-
Check node resources:
docker node ls docker node inspect <node-id> --format '{{.Status}}'
-
Manually pull image on node:
# SSH to swarm node docker pull myregistry/myimage:tag
Symptom: Command doesn't exit after deployment
Root Cause: Waiting for health checks (expected behavior)
Solutions:
-
Use
--no-waitif you don't want to wait:stackman apply -n mystack -f docker-compose.yml --no-wait
-
Check health check status in another terminal:
docker service ps mystack_servicename
# View real-time events
docker events --filter type=container --filter type=service
# Inspect service configuration
docker service inspect mystack_servicename --pretty
# Check service update status
docker service inspect mystack_servicename --format '{{.UpdateStatus}}'
# View task history (including failed tasks)
docker service ps mystack_servicename --no-trunc
# Get container logs for specific task
docker service logs mystack_servicename --tail 100 --followThis is a comprehensive roadmap aligned with the technical specification. Items are prioritized by importance and complexity.
- Snapshot-based rollback - Capture service state before deployment
- Event-driven task monitoring - Subscribe to Docker events for task lifecycle
- Health status polling - Check
ContainerInspect.State.Health - UpdateStatus tracking - Wait for
ServiceInspect.UpdateStatus.State == completed - DeployID injection - Add
com.stackman.deploy.idlabel to tasks - Signal handling - SIGINT/SIGTERM β rollback
- Exit code alignment - Implement codes 2 (timeout), 3 (rollback failed), 4 (connection error)
- Reconciliation loop - Periodic task list refresh to catch missed events
- YAML parser - Parse docker-compose.yml with
gopkg.in/yaml.v3 - Path resolution - Convert relative paths using
STACKMAN_WORKDIR - Environment substitution - Support
${VAR}syntax - Service spec conversion - Map
deploy.*to SwarmServiceSpec - Secrets creation - Implement
docker secret createfromsecrets:section - Configs creation - Implement
docker config createfromconfigs:section - Templating engine - Implement
--valuesand--set(basic key-value replacement)
- Network creation - Create overlay networks from
networks:section - Volume creation - Create local volumes from
volumes:section - Secrets handling - Full lifecycle (create, update, attach to services)
- Configs handling - Full lifecycle (create, update, attach to services)
- External resources - Respect
external: trueflag (skip create/delete) - Resource pruning - Implement
--prunefor orphaned networks/volumes/secrets/configs
- apply command - Main deployment workflow (β implemented)
- diff command - Show deployment plan without applying
- status command - Show current stack status with health info
- logs command - Stream logs from stack services with filters
- events command - Stream Docker events filtered by stack
- rollback command - Standalone rollback from saved snapshot
- Diff-based planning - Only update services that actually changed (use
internal/plan) - Parallel service updates - Implement
--parallelflag for concurrent updates - Dependency ordering - Respect
depends_onfor deployment order (best-effort) - Smart rollback decision - Only rollback changed services, not entire stack
- Update progress tracking - Real-time progress bar with task counts
- :latest tag blocking - Require
--allow-latestflag - Conflict detection - Check for name conflicts in resources
- Version conflict handling - Retry
ServiceUpdateonVersion.Indexrace condition - Secret content masking - Never log secret data
- Dry-run mode -
--dry-runflag to show plan without applying
- Structured logging - Implement
internal/outputwith log levels - JSON output mode -
--jsonflag for machine-readable output - Progress formatting - Pretty progress bars and status tables
- Deployment metrics - Report deployment duration, task counts, health check times
- Audit log - Log all actions (create, update, delete) with timestamps
- Config file support -
.stackman.yamlfor default settings - Multiple Docker hosts -
--docker-hostflag for remote deployments - TLS configuration - Full support for
--tls,--cert-path,--tlsverify - Context switching - Respect Docker contexts
- Retry logic - Exponential backoff for API failures
- Timeout configurability - Per-service timeout overrides
- Graceful degradation - Continue deployment if non-critical services fail
- Connection pooling - Optimize Docker API client usage
- Unit tests - Core logic (compose parser, path resolver, health monitor)
- Integration tests - Full deployment cycle with real Swarm
- Negative tests - Error handling (API failures, timeout, invalid compose)
- Rollback tests - Verify rollback correctness
- Signal tests - SIGINT/SIGTERM handling
- Secrets/configs tests - Lifecycle testing
- Performance tests - Large stacks (20+ services)
- README - Comprehensive usage guide
- Technical spec - Full architecture document
- API documentation - GoDoc for all packages
- Examples - Real-world compose files (with secrets, configs, multiple networks)
- Troubleshooting guide - Common errors and solutions (β added to README)
- Migration guide - From
docker stack deploytostackman
- Auto-completion - Bash/Zsh/Fish completion scripts
- Plugin system - Custom hooks (pre-deploy, post-deploy, on-failure)
- Remote snapshots - Store snapshots in S3/registry for team collaboration
- Canary deployments - Built-in support for gradual rollouts
- Blue-green deployments - Automated traffic switching
- Notification integrations - Slack/Discord/PagerDuty webhooks
- Web UI - Optional web dashboard for stack status
- Multi-stack orchestration - Deploy multiple stacks with dependencies
- Race condition in event handling - Rare: events may be missed if subscription starts after task creation ( workaround: reconciliation loop)
- Large image pull timeout - No streaming progress for image pull in logs
- No task restart limit - Swarm may restart failed tasks indefinitely
- Health check log truncation - Long health check output is truncated
- Parallel updates not implemented -
--parallelflag parsed but not used
| Category | Implemented | Planned | Total | % Complete |
|---|---|---|---|---|
| Core Deployment | 9 | 3 | 12 | 75% |
| Compose Support | 4 | 3 | 7 | 57% |
| Resources | 2 | 5 | 7 | 29% |
| Commands | 2 | 5 | 7 | 29% |
| Safety & Validation | 1 | 4 | 5 | 20% |
| Testing | 2 | 6 | 8 | 25% |
| TOTAL | 20 | 26 | 46 | 43% |
If you want to contribute, focus on these high-impact items:
- Exit code alignment - Easy win, improves CI/CD integration
- Secrets/Configs creation - Required for full compose parity
- diff command - Highly requested, relatively simple
- JSON output mode - Enables advanced tooling integration
- Reconciliation loop - Improves reliability
See CONTRIBUTING.md for development guidelines.
Contributions are welcome! Please feel free to submit issues or pull requests.
# Clone repository
git clone https://github.com/SomeBlackMagic/stackman.git
cd stackman
# Install dependencies
go mod download
# Run tests
go test ./...
# Build
go build -o stackman .Licensed under the GPL-3.0 license. See LICENSE for details.
Developed by SomeBlackMagic
Built with:
- Docker Engine API
- Go
- gopkg.in/yaml.v3 for YAML parsing