Skip to content

SomeBlackMagic/stackman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

49 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Cilium Logo

Build App codecov License: GPL-3.0 Go Version Docker Github Repo Size GitHub Release

stackman - Docker Swarm stack orchestrator with health-aware deployment, intelligent rollback, and Helm-like workflow for Docker Swarm.

A CLI tool written in Go that brings zero-downtime deployments, real-time health monitoring, and automatic rollback to Docker Swarm, filling the gap between basic docker stack deploy and enterprise-grade deployment automation.


The Problem It Solves

Docker Swarm's docker stack deploy has a critical limitation: it returns immediately after submitting the deployment request, without waiting for services to actually start, become healthy, or validate successful deployment. This creates production risks:

Problems with docker stack deploy

Problem Impact Example
No validation Deployments appear successful even when they fail CI/CD marks green, but service crashes
No health awareness Broken services go unnoticed until user reports Database migration fails, app starts anyway
Manual rollback No automatic recovery from bad deployments 3 AM page, manual investigation required
Silent failures Task failures, health check failures ignored Service dies repeatedly, no alerts
No deployment tracking Can't tell when deployment actually completes Is it done? Is it healthy? Unknown.

How stackman Solves This

stackman wraps Docker Swarm API with deployment intelligence:

βœ… Waits for deployment - Monitors service updates until all tasks are running βœ… Health validation - Ensures all containers pass health checks before success βœ… Automatic rollback - Reverts to previous state on failure or timeout βœ… Real-time visibility - Streams logs, events, and health status during deployment βœ… Production-ready - Signal handling, proper exit codes, CI/CD integration βœ… Task tracking - Monitors old task shutdown and new task startup with version control


How It Works

stackman follows a deployment lifecycle pattern with automatic safety mechanisms:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. PARSE & VALIDATE                                              β”‚
β”‚    β€’ Parse docker-compose.yml                                    β”‚
β”‚    β€’ Validate image tags (no :latest without --allow-latest)     β”‚
β”‚    β€’ Convert compose spec to Swarm ServiceSpec                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 2. SNAPSHOT (Rollback Preparation)                               β”‚
β”‚    β€’ Capture current service specs (ServiceInspect)              β”‚
β”‚    β€’ Store service versions and task states                      β”‚
β”‚    β€’ Record resources (networks, volumes, secrets, configs)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 3. DEPLOYMENT                                                    β”‚
β”‚    β€’ Remove obsolete services (--prune)                          β”‚
β”‚    β€’ Pull images with progress tracking                          β”‚
β”‚    β€’ Create/update networks (overlay)                            β”‚
β”‚    β€’ Create/update volumes (local)                               β”‚
β”‚    β€’ Deploy services with unique DeployID label                  β”‚
β”‚    β€’ Track service version changes                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 4. HEALTH MONITORING (unless --no-wait)                          β”‚
β”‚    β€’ Subscribe to Docker events (task lifecycle)                 β”‚
β”‚    β€’ Start per-task monitors with log streaming                  β”‚
β”‚    β€’ Track UpdateStatus.State β†’ "completed"                      β”‚
β”‚    β€’ Poll container health status (State.Health)                 β”‚
β”‚    β€’ Wait for all tasks: Running + Healthy                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ All healthy?    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       ↙           β†˜
                  YES               NO/TIMEOUT/SIGINT
                   ↓                 ↓
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚ βœ… SUCCESS   β”‚    β”‚ ⚠️ ROLLBACK      β”‚
          β”‚ Exit 0       β”‚    β”‚ β€’ Restore specs  β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚ β€’ Revert versionsβ”‚
                              β”‚ β€’ Wait healthy   β”‚
                              β”‚ Exit 1/2/130     β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Deployment Process Details

Phase 1: Pre-Deployment

  • Compose parsing - YAML β†’ internal model (no external compose libraries)
  • Path resolution - Converts relative paths to absolute using STACKMAN_WORKDIR
  • Validation - Checks :latest tag protection, required fields
  • Templating - Applies ${VAR} environment variable substitution

Phase 2: Snapshotting

  • ServiceInspect - Captures current Spec and Version.Index for each service
  • Resource inventory - Records existing networks, volumes, secrets, configs
  • Rollback readiness - Ensures we can restore previous state on failure

Phase 3: Deployment Execution

  1. Image Pull - Pre-pulls all images (respects DOCKER_CONFIG_PATH for auth)
  2. Resource Creation - Networks β†’ Volumes β†’ Secrets β†’ Configs (dependency order)
  3. Service Update - Uses ServiceUpdate API with current Version.Index
  4. DeployID Injection - Adds com.stackman.deploy.id label to all tasks for tracking

Phase 4: Health Monitoring

  • Event Subscription - Listens to type=task events filtered by stack namespace
  • Task Watchers - Spawns goroutine per task for log streaming and container inspection
  • UpdateStatus Tracking - Waits for UpdateStatus.State == "completed"
  • Health Polling - Periodic ContainerInspect checks State.Health.Status == "healthy"
  • DeployID Filtering - Only monitors tasks with matching com.stackman.deploy.id label

Phase 5: Rollback (on failure)

  • Trigger Conditions: Health timeout, task failures, SIGINT/SIGTERM
  • Restoration: Applies previous ServiceSpec using saved Version.Index
  • Health Re-check: Waits for rolled-back services to become healthy (with --rollback-timeout)

Key Features

πŸš€ Deployment Intelligence

  • βœ… Health-aware deployment - Waits for all services to become healthy before declaring success
  • βœ… Automatic rollback - Reverts to previous state on failure, timeout, or interrupt (SIGINT/SIGTERM)
  • βœ… Version-aware updates - Tracks service Version.Index to prevent update conflicts
  • βœ… DeployID tracking - Injects unique deployment ID into all tasks for precise monitoring
  • βœ… Image pre-pull - Pulls images before deployment with progress tracking
  • βœ… Dependency-ordered deployment - Creates resources in correct order (networks β†’ volumes β†’ secrets β†’ services)

πŸ” Real-Time Monitoring

  • βœ… Event-driven architecture - Subscribes to Docker events (type=task) for instant task lifecycle updates
  • βœ… Per-task monitoring - Spawns dedicated watcher goroutine for each task with log streaming
  • βœ… UpdateStatus tracking - Monitors ServiceInspectWithRaw β†’ UpdateStatus.State == "completed"
  • βœ… Health polling - Periodic ContainerInspect checks State.Health.Status
  • βœ… Failed task detection - Reports task failures with exit codes and error messages
  • βœ… No healthcheck tolerance - Services without healthchecks are considered healthy if running

πŸ“¦ Compose File Support

  • βœ… Custom YAML parser - No external compose libraries (only gopkg.in/yaml.v3)
  • βœ… Path resolution - Converts relative paths (./data) to absolute using STACKMAN_WORKDIR or CWD
  • βœ… Environment substitution - Supports ${VAR} syntax for environment variables
  • βœ… Full Swarm spec mapping - Converts deploy.replicas, deploy.update_config, deploy.placement, etc.
  • βœ… Resource support - Networks (overlay), Volumes (local), Secrets, Configs (parsing implemented)
  • βœ… Healthcheck conversion - Maps healthcheck to ContainerSpec.Healthcheck

πŸ›‘οΈ Safety & Reliability

  • βœ… Snapshot-based rollback - Captures ServiceInspect before deployment for safe revert
  • βœ… Signal handling - Intercepts SIGINT/SIGTERM β†’ triggers rollback β†’ exits with code 130
  • βœ… Timeout protection - --timeout for deployment, --rollback-timeout for rollback
  • βœ… Image tag validation - Blocks :latest tag unless --allow-latest is set
  • βœ… Idempotency - Repeated applies without changes result in no-op
  • βœ… Concurrent-safe - Handles multiple goroutines for task monitoring with mutexes

πŸ”§ Operational Features

  • βœ… Multiple subcommands - apply, rollback, diff, status, logs, events
  • βœ… CI/CD friendly - Proper exit codes (0=success, 1=failure, 2=timeout, 130=interrupted)
  • βœ… TLS support - Respects DOCKER_HOST, DOCKER_TLS_VERIFY, DOCKER_CERT_PATH
  • βœ… Registry authentication - Uses DOCKER_CONFIG_PATH for private registry auth (config.json)
  • βœ… Parallel updates - --parallel flag for concurrent service updates (not yet fully implemented)
  • βœ… No external dependencies - Only uses: github.com/docker/docker, github.com/docker/go-units, golang.org/x/net, gopkg.in/yaml.v3

Installation

Option 1: Download Pre-built Binary

# Linux (amd64)
wget https://github.com/SomeBlackMagic/stackman/releases/latest/download/stackman-linux-amd64
chmod +x stackman-linux-amd64
sudo mv stackman-linux-amd64 /usr/local/bin/stackman

# macOS (amd64)
curl -L https://github.com/SomeBlackMagic/stackman/releases/latest/download/stackman-darwin-amd64 -o stackman
chmod +x stackman
sudo mv stackman /usr/local/bin/stackman

Option 2: Build from Source

git clone https://github.com/SomeBlackMagic/stackman.git
cd stackman
go build -o stackman .
sudo mv stackman /usr/local/bin/stackman

Option 3: Using Make

make build          # Build binary
make install        # Install to /usr/local/bin

Verify Installation

stackman version

Usage

Commands Overview

stackman <command> [flags]

Available Commands

Command Description Status
apply Deploy or update a stack βœ… Implemented
rollback Rollback stack to previous state 🚧 Stub
diff Show deployment plan without applying 🚧 Stub
status Show current stack status 🚧 Stub
logs Show logs for stack services 🚧 Stub
events Show events for stack services 🚧 Stub
version Show version information βœ… Implemented

apply Command (Primary Usage)

Deploy or update a Docker Swarm stack with health monitoring and automatic rollback.

Basic Syntax

stackman apply -n <stack-name> -f <compose-file> [flags]

Flags

Flag Type Default Description
-n, --name string (required) Stack name
-f, --file string (required) Path to docker-compose.yml
--values string - Values file for templating (not yet implemented)
--set string - Set values (key=value pairs, not yet implemented)
--timeout duration 15m Deployment health check timeout
--rollback-timeout duration 10m Rollback timeout
--no-wait bool false Don't wait for health checks
--prune bool false Remove orphaned services
--allow-latest bool false Allow :latest image tags
--parallel int 1 Parallel service updates (not yet implemented)
--logs bool true Stream container logs during deployment

Examples

Basic Deployment

stackman apply -n mystack -f docker-compose.yml

With Custom Timeouts

stackman apply -n mystack -f docker-compose.yml --timeout 20m --rollback-timeout 5m

Deploy Without Waiting (Fire and Forget)

stackman apply -n mystack -f docker-compose.yml --no-wait

Remove Obsolete Services

stackman apply -n mystack -f docker-compose.yml --prune

Allow :latest Tag (Not Recommended for Production)

stackman apply -n mystack -f docker-compose.yml --allow-latest

Disable Log Streaming

stackman apply -n mystack -f docker-compose.yml --logs=false

Using Environment Variables for Docker Connection

export DOCKER_HOST=tcp://192.168.1.100:2376
export DOCKER_TLS_VERIFY=1
export DOCKER_CERT_PATH=/path/to/certs
stackman apply -n mystack -f docker-compose.yml

Using Private Registry Authentication

# Ensure $HOME/.docker/config.json contains auth credentials
# Or set custom path:
export DOCKER_CONFIG_PATH=/etc/docker
stackman apply -n mystack -f docker-compose.yml

Configuration

Environment Variables

stackman reads configuration from environment variables:

Docker Connection

Variable Description Default Example
DOCKER_HOST Docker daemon socket unix:///var/run/docker.sock tcp://192.168.1.100:2376
DOCKER_TLS_VERIFY Enable TLS verification 0 1
DOCKER_CERT_PATH Path to TLS certificates - /etc/docker/certs
DOCKER_CONFIG_PATH Path to Docker config directory (for registry auth) $HOME/.docker /etc/docker

Deployment Behavior

Variable Description Default Example
STACKMAN_WORKDIR Base path for relative volume mounts Current working directory /var/app/stacks/production
STACKMAN_DEPLOY_TIMEOUT Deployment timeout (overridden by --timeout flag) 15m 20m
STACKMAN_ROLLBACK_TIMEOUT Rollback timeout (overridden by --rollback-timeout flag) 10m 5m

Logging & Output

Variable Description Default Example
LOG_LEVEL Log verbosity (not yet implemented) info debug
NO_COLOR Disable colored output (not yet implemented) false true

Configuration Precedence

Priority (highest to lowest):

  1. Command-line flags (e.g., --timeout 20m)
  2. Environment variables (e.g., STACKMAN_DEPLOY_TIMEOUT=20m)
  3. Default values (e.g., 15m for timeout)

Exit Codes

stackman follows standard Unix exit code conventions for CI/CD integration:

Code Meaning Trigger Condition Rollback Performed?
0 Success All services deployed and healthy N/A
1 Failure Deployment failed (parse error, API error, validation failed) βœ… Yes (if deployment started)
2 Timeout Health check timeout reached βœ… Yes
3 Rollback Failed Deployment failed AND rollback also failed (as per spec) ⚠️ Attempted but failed
4 Connection Error Docker API/Registry connection failed (as per spec) N/A
130 Interrupted User pressed Ctrl+C (SIGINT) or SIGTERM received βœ… Yes

Exit Code Usage in CI/CD

# GitLab CI / GitHub Actions example
stackman apply -n production -f docker-compose.yml
EXIT_CODE=$?

if [ $EXIT_CODE -eq 0 ]; then
  echo "βœ… Deployment successful"
elif [ $EXIT_CODE -eq 1 ]; then
  echo "❌ Deployment failed, rollback succeeded"
  exit 1
elif [ $EXIT_CODE -eq 2 ]; then
  echo "⏱️ Deployment timeout, rollback succeeded"
  exit 1
elif [ $EXIT_CODE -eq 130 ]; then
  echo "πŸ›‘ Deployment interrupted, rollback succeeded"
  exit 1
else
  echo "πŸ’₯ Critical error (code $EXIT_CODE)"
  exit $EXIT_CODE
fi

Requirements

Runtime Requirements

Requirement Version Purpose
Docker Engine 19.03+ Swarm API access
Docker Swarm Initialized docker swarm init
Operating System Linux / macOS / Windows Cross-platform
Architecture amd64 / arm64 Binary architecture

Build Requirements (Only for Building from Source)

Requirement Version Purpose
Go 1.24+ Compiler toolchain
Make (optional) Build automation

Docker Compose File Requirements

βœ… Supported: Compose file format version 3.x (Swarm mode) ❌ Not Supported: Compose file version 2.x (standalone Docker)

Health Check Recommendations

  • βœ… Recommended: Define healthcheck for all services for accurate deployment validation
  • ⚠️ Optional: Services without healthcheck are considered healthy if task is in running state
  • πŸ” Best Practice: Use fast healthchecks (interval: 5-10s) with reasonable start_period

Minimal Working Example

version: '3.8'

services:
  web:
    image: nginx:1.25-alpine
    healthcheck:
      test: [ "CMD", "wget", "-q", "--spider", "http://localhost" ]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 5s
    deploy:
      replicas: 2
      update_config:
        parallelism: 1
        delay: 10s

networks:
  default:
    driver: overlay

Pre-Deployment Checklist

Before using stackman, ensure:

# 1. Docker is running
docker info

# 2. Swarm is initialized
docker swarm init

# 3. You are a swarm manager node
docker node ls

# 4. Your compose file is valid
docker-compose -f docker-compose.yml config

# 5. Required images are pullable (for private registries)
docker login registry.example.com

Output Examples

Successful Deployment

2025/11/06 14:33:05 Start Docker Stack Wait version=1.0.0 revision=abc123
2025/11/06 14:33:05 Parsing compose file: docker-compose.yml
2025/11/06 14:33:05 Creating snapshot of current stack state...
2025/11/06 14:33:06 Snapshotted service: mystack_web (version 42)
2025/11/06 14:33:06 Snapshotted service: mystack_api (version 38)
2025/11/06 14:33:06 Snapshot created with 2 services
2025/11/06 14:33:06 Starting deployment of stack: mystack
2025/11/06 14:33:06 No obsolete services to remove
2025/11/06 14:33:06 Pulling image: nginx:latest
2025/11/06 14:33:08 Image nginx:latest pulled successfully
2025/11/06 14:33:08 Network mystack_default already exists
2025/11/06 14:33:08 Updating service: mystack_web
2025/11/06 14:33:09 Service mystack_web updated, waiting for tasks to be recreated...
[event:service:mystack_web] update
[event:container:mystack_web.1.xyz] start
Stack deployed successfully. Starting health checks...
2025/11/06 14:33:12 Starting log streaming for 2 services...
Waiting for service tasks to start...
2025/11/06 14:33:15 Waiting for services to become healthy...
2025/11/06 14:33:20 Container statuses: mystack_web.1: running/starting, mystack_api.1: running/healthy
[event:container:mystack_web.1.xyz] healthcheck passed (exit 0): curl -f http://localhost
2025/11/06 14:33:25 Container statuses: mystack_web.1: running/healthy, mystack_api.1: running/healthy
2025/11/06 14:33:25 All containers are healthy (checked 2 containers)
All containers healthy.

Failed Deployment with Rollback

2025/11/06 14:45:10 Start Docker Stack Wait version=1.0.0 revision=abc123
2025/11/06 14:45:10 Parsing compose file: docker-compose.yml
2025/11/06 14:45:10 Creating snapshot of current stack state...
2025/11/06 14:45:11 Starting deployment of stack: mystack
2025/11/06 14:45:11 Pulling image: myapp:broken-version
2025/11/06 14:45:15 Updating service: mystack_api
[event:service:mystack_api] update
[event:container:mystack_api.1.abc] start
[event:container:mystack_api.1.abc] healthcheck failed (exit 1): curl -f http://localhost/health
2025/11/06 14:45:30 ERROR: Service mystack_api task abc123def456 failed with state shutdown (desired: shutdown)
2025/11/06 14:45:30 ERROR: New task abc123def456 failed with state complete (desired: shutdown): task: non-zero exit (1)
  Container exit code: 1
  Task was shutdown and replaced (likely healthcheck failure)
ERROR: Services failed healthcheck or didn't start in time.
Starting rollback to previous state...
2025/11/06 14:45:31 Rolling back stack: mystack
2025/11/06 14:45:31 Rolling back service: mystack_api to version 38
2025/11/06 14:45:32 Service mystack_api rolled back successfully
2025/11/06 14:45:32 Rollback completed for stack: mystack
Rollback completed successfully

Interrupted Deployment

2025/11/06 14:50:15 Start Docker Stack Wait version=1.0.0 revision=abc123
2025/11/06 14:50:15 Deploying stack: mystack
[event:service:mystack_web] update
^C
2025/11/06 14:50:20 Received signal: interrupt
2025/11/06 14:50:20 Deployment interrupted, initiating rollback...
Starting rollback to previous state...
2025/11/06 14:50:21 Rolling back stack: mystack
2025/11/06 14:50:22 Service mystack_web rolled back successfully
Rollback completed successfully

Project Structure

stackman follows a clean, modular architecture aligned with the technical specification:

stackman/
β”œβ”€β”€ main.go                      # Entry point and CLI orchestration
β”œβ”€β”€ cmd/                         # CLI commands (cobra-like structure)
β”‚   β”œβ”€β”€ root.go                  # Command router and usage
β”‚   β”œβ”€β”€ apply.go                 # apply command (βœ… IMPLEMENTED)
β”‚   β”œβ”€β”€ rollback.go              # rollback command (🚧 stub)
β”‚   β”œβ”€β”€ logs.go                  # logs command (🚧 stub)
β”‚   β”œβ”€β”€ events.go                # events command (🚧 stub)
β”‚   β”œβ”€β”€ stubs.go                 # Stub implementations for incomplete commands
β”‚   └── version.go               # version command (βœ… IMPLEMENTED)
β”œβ”€β”€ internal/                    # Internal packages (not importable externally)
β”‚   β”œβ”€β”€ compose/                 # βœ… Compose file parsing (no external libs)
β”‚   β”‚   β”œβ”€β”€ types.go             # Compose spec types (services, networks, volumes, etc.)
β”‚   β”‚   β”œβ”€β”€ parser.go            # YAML β†’ ComposeSpec parser (gopkg.in/yaml.v3)
β”‚   β”‚   └── converter.go         # ComposeSpec β†’ Swarm ServiceSpec converter
β”‚   β”œβ”€β”€ swarm/                   # βœ… Docker Swarm API client wrapper
β”‚   β”‚   β”œβ”€β”€ interface.go         # StackDeployer interface
β”‚   β”‚   β”œβ”€β”€ stack.go             # Stack deployment orchestration
β”‚   β”‚   β”œβ”€β”€ services.go          # Service create/update logic
β”‚   β”‚   β”œβ”€β”€ images.go            # Image pull with progress tracking
β”‚   β”‚   β”œβ”€β”€ networks.go          # Network create/inspect logic
β”‚   β”‚   β”œβ”€β”€ volumes.go           # Volume create/inspect logic
β”‚   β”‚   β”œβ”€β”€ cleanup.go           # Obsolete service removal
β”‚   β”‚   β”œβ”€β”€ rollback.go          # Rollback execution
β”‚   β”‚   └── state.go             # Current swarm state reading
β”‚   β”œβ”€β”€ health/                  # βœ… Health monitoring and event handling
β”‚   β”‚   β”œβ”€β”€ watcher.go           # Event-driven task watcher (Docker events API)
β”‚   β”‚   β”œβ”€β”€ monitor.go           # Per-task monitor with log streaming
β”‚   β”‚   β”œβ”€β”€ events.go            # Event types and subscription
β”‚   β”‚   └── service_update_monitor.go  # ServiceInspect UpdateStatus tracker
β”‚   β”œβ”€β”€ snapshot/                # βœ… Snapshot creation and restoration
β”‚   β”‚   └── snapshot.go          # ServiceInspect capture and rollback
β”‚   β”œβ”€β”€ deployment/              # βœ… Deployment ID generation
β”‚   β”‚   └── id.go                # Unique deployment ID (timestamp-based)
β”‚   β”œβ”€β”€ paths/                   # βœ… Path resolution logic
β”‚   β”‚   └── resolver.go          # STACKMAN_WORKDIR + relative β†’ absolute conversion
β”‚   β”œβ”€β”€ plan/                    # 🚧 Diff and deployment plan (partially implemented)
β”‚   β”‚   β”œβ”€β”€ types.go             # Plan types (Create/Update/Delete)
β”‚   β”‚   β”œβ”€β”€ planner.go           # Diff logic (current vs desired state)
β”‚   β”‚   └── formatter.go         # Plan output formatting
β”‚   β”œβ”€β”€ apply/                   # πŸ”œ Apply orchestration (currently in cmd/apply.go)
β”‚   β”œβ”€β”€ rollback/                # πŸ”œ Rollback orchestration (currently in snapshot/)
β”‚   β”œβ”€β”€ signals/                 # πŸ”œ SIGINT/SIGTERM handling (currently in cmd/apply.go)
β”‚   └── output/                  # πŸ”œ Structured output and formatting
β”œβ”€β”€ tests/                       # βœ… Integration tests
β”‚   β”œβ”€β”€ apply_test.go            # Full deployment cycle tests
β”‚   β”œβ”€β”€ health_test.go           # Health check monitoring tests
β”‚   β”œβ”€β”€ resources_test.go        # Networks, volumes, secrets, configs tests
β”‚   β”œβ”€β”€ operations_test.go       # Prune, signals tests
β”‚   β”œβ”€β”€ negative_test.go         # Error handling tests
β”‚   β”œβ”€β”€ helpers_test.go          # Test utilities
β”‚   └── testdata/                # Test compose files
└── docs/                        # Documentation
    β”œβ”€β”€ stackman-diff-techspec-full.md  # Full technical specification
    β”œβ”€β”€ TEST_RESULTS.md          # Test validation results
    └── TESTING.md               # Testing guide

Package Responsibilities

Package Responsibility Status
cmd/ CLI interface, argument parsing, command routing βœ… Core commands implemented
internal/compose/ Parse docker-compose.yml β†’ internal model βœ… Fully implemented
internal/swarm/ Docker Swarm API operations (services, tasks, networks, volumes) βœ… Core operations implemented
internal/health/ Event-driven task monitoring, health checks, log streaming βœ… Fully implemented
internal/snapshot/ Capture and restore service state for rollback βœ… Implemented
internal/deployment/ Generate unique deployment IDs for task tracking βœ… Implemented
internal/paths/ Resolve relative paths to absolute using STACKMAN_WORKDIR βœ… Implemented
internal/plan/ Diff current vs desired state, generate deployment plan 🚧 Partially implemented
internal/apply/ High-level apply orchestration πŸ”œ To be extracted from cmd/apply.go
internal/rollback/ High-level rollback orchestration πŸ”œ To be extracted from snapshot/
internal/signals/ SIGINT/SIGTERM handling with context propagation πŸ”œ To be extracted from cmd/apply.go
internal/output/ Structured logging, JSON output, progress formatting πŸ”œ Planned

Docker Compose Support

stackman includes a comprehensive Docker Compose parser that converts docker-compose.yml files to Docker Swarm service specifications.

Supported Docker Compose Features

Service Configuration

  • Images & Build: image, build (context, dockerfile, args, target, cache_from)
  • Commands: command, entrypoint
  • Environment: environment (array and map formats), env_file
  • Container Settings: hostname, domainname, user, working_dir, stdin_open, tty, read_only, init
  • Lifecycle: stop_signal, stop_grace_period, restart

Networking

  • Ports: Short syntax ("8080:80") and long syntax (with mode and protocol)
  • Networks: Network attachment with aliases
  • DNS: dns, dns_search, dns_opt
  • Hosts: extra_hosts, mac_address

Storage

  • Volumes: Bind mounts with automatic relative β†’ absolute path conversion
  • Named Volumes: Volume references from top-level volumes: section
  • Tmpfs: Temporary filesystem mounts

Health Checks

  • Test Commands: CMD-SHELL and exec array formats
  • Timing: interval, timeout, retries, start_period
  • Control: disable flag

Deployment (Swarm-specific)

  • Mode: replicated (with replica count) or global
  • Updates: Parallelism, delay, order, failure action, monitor period, max failure ratio
  • Rollback: Same configuration as updates
  • Resources: CPU and memory limits/reservations
  • Restart Policy: Condition, delay, max attempts, window
  • Placement: Node constraints, spread preferences, max replicas per node

Security & Capabilities

  • Capabilities: cap_add, cap_drop
  • Devices: Device mappings
  • Isolation: Container isolation technology

Top-Level Sections

  • Services: Complete service definitions
  • Networks: Custom networks with driver options, IPAM config
  • Volumes: Named volumes with driver options
  • Secrets: File or external secrets (parsed, creation not implemented)
  • Configs: File or external configs (parsed, creation not implemented)

Known Limitations

Some Docker Compose fields are parsed but not applied due to Docker Swarm API restrictions:

Field Reason
privileged Not supported in Swarm mode
security_opt Not available in Swarm ContainerSpec
sysctls Not available in Swarm ContainerSpec
ulimits Not available in Swarm ContainerSpec
links, external_links Deprecated in favor of networks
depends_on No start order control in Swarm

These fields remain in the type definitions for completeness and potential future use.


Use Cases

CI/CD Pipelines

# GitLab CI example
deploy:
  stage: deploy
  script:
    - stackman production docker-compose.yml 10 5
  only:
    - main

Blue-Green Deployments

# Deploy to green environment
stackman green-stack docker-compose.yml

# If successful, switch traffic and deploy to blue
stackman blue-stack docker-compose.yml

Canary Deployments

# docker-compose.yml
services:
  web:
    image: myapp:${VERSION}
    deploy:
      replicas: 1  # Start with 1 replica
      update_config:
        parallelism: 1
        delay: 30s
# Deploy canary
VERSION=v2.0 stackman mystack docker-compose.yml

# If healthy, scale up
docker service scale mystack_web=10

Multi-Environment Deployment

#!/bin/bash
# deploy.sh

ENVIRONMENTS=("dev" "staging" "production")
COMPOSE_FILE="docker-compose.yml"

for ENV in "${ENVIRONMENTS[@]}"; do
  echo "Deploying to $ENV..."
  if stackman "${ENV}-stack" "$COMPOSE_FILE" 15 3; then
    echo "βœ“ $ENV deployment successful"
  else
    echo "βœ— $ENV deployment failed, stopping"
    exit 1
  fi
done

Troubleshooting

Common Issues and Solutions

πŸ”΄ Issue: Deployment times out waiting for health checks

Symptoms:

[HealthCheck] ⏳ Task abc123 (mystack_web) is starting
ERROR: timeout after 15m waiting for services to become healthy

Root Causes:

  • Health check start_period too short for slow-starting apps
  • Health check command timing out or failing
  • Insufficient resources (CPU/memory) causing slow startup

Solutions:

  1. Increase deployment timeout:

    stackman apply -n mystack -f docker-compose.yml --timeout 30m
  2. Adjust healthcheck in compose file:

    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 60s  # Increase if app needs more startup time
  3. Test healthcheck command manually:

    docker exec <container-id> curl -f http://localhost/health

πŸ”΄ Issue: Service keeps failing with "task: non-zero exit"

Symptoms:

[ServiceMonitor] ❌ Service mystack_api: Task xyz789 failed - task: non-zero exit (1)
ERROR: New task failed with state complete (desired: shutdown): task: non-zero exit (1)

Root Causes:

  • Application crash on startup
  • Health check failing repeatedly
  • Missing environment variables or secrets
  • Configuration errors

Solutions:

  1. Check service logs:

    docker service logs mystack_servicename --tail 100
  2. Inspect task details:

    docker service ps mystack_servicename --no-trunc
  3. Test container locally (before swarm deployment):

    docker run --rm myimage:tag
  4. Check for missing config/secrets:

    docker config ls
    docker secret ls

πŸ”΄ Issue: Rollback restores old version but it's also unhealthy

Symptoms:

Starting rollback to previous state...
[ServiceMonitor] ❌ Service mystack_web: Task def456 failed
ERROR: Rollback failed: services did not become healthy after rollback

Root Cause: Previous version has underlying health issues (database schema mismatch, missing dependencies, etc.)

Solutions:

  1. Deploy a known-good version (not rollback):

    # Use older compose file with working version
    stackman apply -n mystack -f docker-compose.v1.2.3.yml
  2. Fix health check in previous version:

    # Temporarily disable strict health checks
    healthcheck:
      test: ["CMD", "true"]  # Always passes
  3. Manual intervention required:

    # Remove stack completely
    docker stack rm mystack
    
    # Wait for cleanup
    sleep 30
    
    # Deploy known-good version
    stackman apply -n mystack -f docker-compose.good.yml

πŸ”΄ Issue: "No services found" or "Stack not found"

Symptoms:

ERROR: No services found for stack: mystack

Root Causes:

  • Stack name mismatch
  • Swarm mode not initialized
  • Wrong Docker daemon context

Solutions:

  1. Verify swarm is initialized:

    docker info | grep Swarm
    # Should show: "Swarm: active"
    
    # If not active:
    docker swarm init
  2. List existing stacks:

    docker stack ls
  3. Check Docker context:

    docker context ls
    docker context use default

πŸ”΄ Issue: Image pull fails with authentication error

Symptoms:

ERROR: failed to pull image registry.example.com/myapp:latest: unauthorized

Solutions:

  1. Login to registry:

    docker login registry.example.com
  2. Set Docker config path:

    export DOCKER_CONFIG_PATH=$HOME/.docker
    stackman apply -n mystack -f docker-compose.yml
  3. Verify auth config:

    cat $HOME/.docker/config.json
    # Should contain "auths": { "registry.example.com": { "auth": "..." } }

πŸ”΄ Issue: Tasks stuck in "Preparing" state

Symptoms:

[HealthCheck] ⏳ Task ghi012 (mystack_db) is preparing
(repeats indefinitely)

Root Causes:

  • Image pull in progress (large images)
  • Node resource constraints
  • Network issues

Solutions:

  1. Check task status:

    docker service ps mystack_servicename --no-trunc
  2. Check node resources:

    docker node ls
    docker node inspect <node-id> --format '{{.Status}}'
  3. Manually pull image on node:

    # SSH to swarm node
    docker pull myregistry/myimage:tag

🟑 Issue: stackman hangs after "Stack deployed successfully"

Symptom: Command doesn't exit after deployment

Root Cause: Waiting for health checks (expected behavior)

Solutions:

  1. Use --no-wait if you don't want to wait:

    stackman apply -n mystack -f docker-compose.yml --no-wait
  2. Check health check status in another terminal:

    docker service ps mystack_servicename

Debugging Commands

# View real-time events
docker events --filter type=container --filter type=service

# Inspect service configuration
docker service inspect mystack_servicename --pretty

# Check service update status
docker service inspect mystack_servicename --format '{{.UpdateStatus}}'

# View task history (including failed tasks)
docker service ps mystack_servicename --no-trunc

# Get container logs for specific task
docker service logs mystack_servicename --tail 100 --follow

TODOLIST

This is a comprehensive roadmap aligned with the technical specification. Items are prioritized by importance and complexity.

πŸ”₯ Priority 1: Core Functionality (MVP Requirements)

Deployment & Health Monitoring

  • Snapshot-based rollback - Capture service state before deployment
  • Event-driven task monitoring - Subscribe to Docker events for task lifecycle
  • Health status polling - Check ContainerInspect.State.Health
  • UpdateStatus tracking - Wait for ServiceInspect.UpdateStatus.State == completed
  • DeployID injection - Add com.stackman.deploy.id label to tasks
  • Signal handling - SIGINT/SIGTERM β†’ rollback
  • Exit code alignment - Implement codes 2 (timeout), 3 (rollback failed), 4 (connection error)
  • Reconciliation loop - Periodic task list refresh to catch missed events

Compose File Support

  • YAML parser - Parse docker-compose.yml with gopkg.in/yaml.v3
  • Path resolution - Convert relative paths using STACKMAN_WORKDIR
  • Environment substitution - Support ${VAR} syntax
  • Service spec conversion - Map deploy.* to Swarm ServiceSpec
  • Secrets creation - Implement docker secret create from secrets: section
  • Configs creation - Implement docker config create from configs: section
  • Templating engine - Implement --values and --set (basic key-value replacement)

Resource Management

  • Network creation - Create overlay networks from networks: section
  • Volume creation - Create local volumes from volumes: section
  • Secrets handling - Full lifecycle (create, update, attach to services)
  • Configs handling - Full lifecycle (create, update, attach to services)
  • External resources - Respect external: true flag (skip create/delete)
  • Resource pruning - Implement --prune for orphaned networks/volumes/secrets/configs

πŸš€ Priority 2: Advanced Features

Commands

  • apply command - Main deployment workflow (βœ… implemented)
  • diff command - Show deployment plan without applying
  • status command - Show current stack status with health info
  • logs command - Stream logs from stack services with filters
  • events command - Stream Docker events filtered by stack
  • rollback command - Standalone rollback from saved snapshot

Deployment Intelligence

  • Diff-based planning - Only update services that actually changed (use internal/plan)
  • Parallel service updates - Implement --parallel flag for concurrent updates
  • Dependency ordering - Respect depends_on for deployment order (best-effort)
  • Smart rollback decision - Only rollback changed services, not entire stack
  • Update progress tracking - Real-time progress bar with task counts

Safety & Validation

  • :latest tag blocking - Require --allow-latest flag
  • Conflict detection - Check for name conflicts in resources
  • Version conflict handling - Retry ServiceUpdate on Version.Index race condition
  • Secret content masking - Never log secret data
  • Dry-run mode - --dry-run flag to show plan without applying

πŸ”§ Priority 3: Production Readiness

Observability

  • Structured logging - Implement internal/output with log levels
  • JSON output mode - --json flag for machine-readable output
  • Progress formatting - Pretty progress bars and status tables
  • Deployment metrics - Report deployment duration, task counts, health check times
  • Audit log - Log all actions (create, update, delete) with timestamps

Configuration

  • Config file support - .stackman.yaml for default settings
  • Multiple Docker hosts - --docker-host flag for remote deployments
  • TLS configuration - Full support for --tls, --cert-path, --tlsverify
  • Context switching - Respect Docker contexts

Reliability

  • Retry logic - Exponential backoff for API failures
  • Timeout configurability - Per-service timeout overrides
  • Graceful degradation - Continue deployment if non-critical services fail
  • Connection pooling - Optimize Docker API client usage

πŸ§ͺ Priority 4: Testing & Documentation

Testing

  • Unit tests - Core logic (compose parser, path resolver, health monitor)
  • Integration tests - Full deployment cycle with real Swarm
  • Negative tests - Error handling (API failures, timeout, invalid compose)
  • Rollback tests - Verify rollback correctness
  • Signal tests - SIGINT/SIGTERM handling
  • Secrets/configs tests - Lifecycle testing
  • Performance tests - Large stacks (20+ services)

Documentation

  • README - Comprehensive usage guide
  • Technical spec - Full architecture document
  • API documentation - GoDoc for all packages
  • Examples - Real-world compose files (with secrets, configs, multiple networks)
  • Troubleshooting guide - Common errors and solutions (βœ… added to README)
  • Migration guide - From docker stack deploy to stackman

πŸ“¦ Priority 5: Nice-to-Have

  • Auto-completion - Bash/Zsh/Fish completion scripts
  • Plugin system - Custom hooks (pre-deploy, post-deploy, on-failure)
  • Remote snapshots - Store snapshots in S3/registry for team collaboration
  • Canary deployments - Built-in support for gradual rollouts
  • Blue-green deployments - Automated traffic switching
  • Notification integrations - Slack/Discord/PagerDuty webhooks
  • Web UI - Optional web dashboard for stack status
  • Multi-stack orchestration - Deploy multiple stacks with dependencies

πŸ› Known Issues

  • Race condition in event handling - Rare: events may be missed if subscription starts after task creation ( workaround: reconciliation loop)
  • Large image pull timeout - No streaming progress for image pull in logs
  • No task restart limit - Swarm may restart failed tasks indefinitely
  • Health check log truncation - Long health check output is truncated
  • Parallel updates not implemented - --parallel flag parsed but not used

πŸ“Š Implementation Status Summary

Category Implemented Planned Total % Complete
Core Deployment 9 3 12 75%
Compose Support 4 3 7 57%
Resources 2 5 7 29%
Commands 2 5 7 29%
Safety & Validation 1 4 5 20%
Testing 2 6 8 25%
TOTAL 20 26 46 43%

Contributing Priority

If you want to contribute, focus on these high-impact items:

  1. Exit code alignment - Easy win, improves CI/CD integration
  2. Secrets/Configs creation - Required for full compose parity
  3. diff command - Highly requested, relatively simple
  4. JSON output mode - Enables advanced tooling integration
  5. Reconciliation loop - Improves reliability

See CONTRIBUTING.md for development guidelines.


Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Development Setup

# Clone repository
git clone https://github.com/SomeBlackMagic/stackman.git
cd stackman

# Install dependencies
go mod download

# Run tests
go test ./...

# Build
go build -o stackman .

License

Licensed under the GPL-3.0 license. See LICENSE for details.


Credits

Developed by SomeBlackMagic

Built with:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •  

Languages