Skip to content

Conversation

cofin
Copy link
Member

@cofin cofin commented Oct 11, 2025

Summary

Implements hybrid versioning for SQLSpec migrations - timestamps in development, sequential in production. Inspired by goose migrations.

Key Features:

  • Timestamp-based migrations in development (no conflicts)
  • Sequential migrations in production (deterministic ordering)
  • Automated conversion via sqlspec fix command
  • Database tracking synchronization for seamless workflows
  • Full backup/rollback safety mechanisms

Closes #116


The Problem

Traditional migration versioning has trade-offs:

Sequential-Only (0001, 0002, 0003):

  • ❌ Merge conflicts when multiple developers create migrations simultaneously
  • ✅ Deterministic ordering

Timestamp-Only (20251011120000):

  • ✅ No merge conflicts
  • ❌ Ordering depends on creation time, not merge order

The Solution: Hybrid Versioning

Development: Timestamps (avoid conflicts)
Production: Sequential (deterministic order)
Conversion: Automated via sqlspec fix command

# Developer creates migration with timestamp
$ sqlspec create-migration -m "add users table"
Created: 20251011120000_add_users.sql

# CI converts to sequential before merge
$ sqlspec fix
✓ Converted 20251011120000_add_users.sql → 0003_add_users.sql
✓ Updated database tracking: 20251011120000 → 0003

Features Implemented

1. Version Conversion Utilities (sqlspec/utils/version.py)

  • generate_conversion_map() - Maps timestamp → sequential with namespace separation
  • convert_to_sequential_version() - Converts with extension prefix preservation
  • get_next_sequential_number() - Finds next available number per namespace
  • Supports core + extension namespaces (ext_litestar_0001, ext_adk_0001)

2. Atomic File Operations (sqlspec/migrations/fix.py)

  • MigrationFixer class with full backup/rollback capability
  • Collision detection before execution
  • SQL content updates (not just filenames)
  • Dry-run support with Rich table preview
  • Atomic operations with automatic cleanup

3. Database Synchronization (sqlspec/migrations/tracker.py)

Automatically updates the database tracking table during conversion:

def update_version_record(driver, old_version: str, new_version: str):
    """Update migration version record from timestamp to sequential.
    
    Idempotent: If already updated, logs and continues without error.
    Allows fix command to be safely re-run after pulling changes.
    """

Benefits:

  • Prevents "missing migration" errors in developer environments
  • Safe to re-run after pulling changes from main
  • Maintains database consistency with filesystem

4. Checksum Canonicalization (sqlspec/migrations/runner.py)

Excludes -- name: migrate-{version}-up/down headers from checksum computation:

  • Checksums remain stable after version conversion
  • No false drift warnings after fix command
  • Validate command passes cleanly post-fix

5. CLI Integration (sqlspec/cli.py, sqlspec/migrations/commands.py)

sqlspec fix [OPTIONS]

Options:
  --dry-run              Preview changes without applying
  --no-database          Skip database tracking updates
  --yes                  Skip confirmation prompt (CI mode)

Key Features

Feature Description
Database Sync Idempotent update_version_record() keeps DB in sync
File Content Updates SQL query names, not just filenames
Backup/Rollback Automatic backup with atomic rollback on failure
Extension Support Independent sequences per extension namespace
Collision Detection Pre-validation before rename prevents conflicts
Checksum Stability Canonicalized checksums remain stable post-fix
Idempotency Safe to re-run multiple times without errors
Rich CLI Interactive previews with confirmation prompts

Test Coverage

248/248 tests passing (100% success rate)

New Tests Added (52 tests):

  • test_checksum_canonicalization.py - 16 tests for stable checksums
  • test_fix_regex_precision.py - 13 tests for version-specific replacement
  • test_tracker_idempotency.py - 14 tests for idempotent database updates
  • test_fix_checksum_stability.py - 3 integration tests
  • test_fix_idempotency_workflow.py - 6 workflow integration tests

Coverage Metrics:

  • sqlspec/utils/version.py: 100%
  • sqlspec/migrations/fix.py: 99%
  • sqlspec/migrations/tracker.py: 82%
  • Overall: 99%+ on core modules

Documentation

User Guide

Hybrid Versioning Guide (15KB, 615 lines):

  • Problem statement and solution explanation
  • Step-by-step workflow examples
  • CI/CD integration (GitHub Actions, GitLab CI)
  • Troubleshooting section
  • Before/after comparisons

CLI Reference

CLI Documentation - Complete fix command reference

CI/CD Examples

GitHub Actions:

name: Fix Migrations
on:
  pull_request:
    branches: [main]
    paths: ['migrations/**']

jobs:
  fix-migrations:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: sqlspec fix --yes
      - run: git add migrations/
      - run: git commit -m "fix: convert migrations to sequential"
      - run: git push

GitLab CI:

fix-migrations:
  stage: migrations
  script:
    - sqlspec fix --yes
    - git add migrations/
    - git commit -m "fix: convert migrations to sequential"
    - git push origin $CI_COMMIT_REF_NAME

Breaking Changes

None. The feature is opt-in and backward compatible.

  • Existing sequential migrations continue to work
  • Timestamp migrations can coexist with sequential
  • Fix command is explicitly invoked, not automatic
  • Extension migrations ship as sequential (production-ready)

Migration Workflow

Development Phase

# Alice creates migration
$ sqlspec create-migration -m "add products table"
Created: 20251011120000_add_products.sql

# Bob creates migration (same time, different PR)
$ sqlspec create-migration -m "add orders table"
Created: 20251011120500_add_orders.sql

# No conflicts! Different timestamps.

Pre-Merge CI Check

# CI converts timestamps to sequential
$ sqlspec fix --dry-run
Preview:
┌─────────────────────────────┬────────────────────────┐
│ Current Version             │ New Version            │
├─────────────────────────────┼────────────────────────┤
│ 20251011120000_add_products │ 0003_add_products      │
└─────────────────────────────┴────────────────────────┘

$ sqlspec fix --yes
✓ Created backup
✓ Renamed 1 migration file
✓ Updated 1 database record
✓ Cleanup complete

# CI commits and pushes
$ git add migrations/ && git commit && git push

Production Deployment

# Production sees only sequential migrations
$ sqlspec migrate
✓ Applied 0003_add_products.sql

Implementation Details

Extension Migration Support

Extensions ship with sequential migrations (production-ready):

sqlspec/extensions/litestar/migrations/
├── 0001_create_session_table.py       # Sequential format
└── __init__.py

sqlspec/extensions/adk/migrations/
├── 0001_create_adk_tables.py          # Sequential format
└── __init__.py

Independent numbering per namespace prevents collisions:

  • Core: 0001, 0002, 0003
  • Litestar: ext_litestar_0001, ext_litestar_0002
  • ADK: ext_adk_0001, ext_adk_0002

Database Schema

Tracking table supports both formats:

CREATE TABLE sqlspec_migration_version (
    version_num VARCHAR(32) PRIMARY KEY,     -- Supports both formats
    version_type VARCHAR(16),                -- 'sequential' or 'timestamp'
    execution_sequence INTEGER,              -- Actual application order
    description TEXT,
    applied_at TIMESTAMP,
    execution_time_ms INTEGER,
    checksum VARCHAR(64),
    applied_by VARCHAR(255)
);

Idempotent Design

The fix command can be safely re-run:

  • Checks if version already updated before attempting conversion
  • Logs and continues if already in sequential format
  • No duplicate updates or errors on re-execution
  • Essential for CI/CD workflows where developers pull changes

Checksum Stability

Checksums exclude version headers to remain stable:

# Before fix: checksum includes version
# -- name: migrate-20251011120000-up
# CREATE TABLE users (...);

# After fix: same checksum (header excluded)
# -- name: migrate-0003-up
# CREATE TABLE users (...);

Commits

  1. 7b6c9759 - feat: implement hybrid versioning fix command
  2. ff006c6a - fix: revert extension migrations to sequential format
  3. 5293387a - refactor: improve fix command robustness and safety
  4. 0bdfbebe - test: add comprehensive tests for hybrid versioning

Files Changed

Implementation (7 files, +569 lines):

  • sqlspec/migrations/fix.py (new, 215 lines)
  • sqlspec/utils/version.py (new, 170 lines)
  • sqlspec/migrations/commands.py (+171 lines)
  • sqlspec/migrations/tracker.py (+72 lines)
  • sqlspec/migrations/base.py (+24 lines)
  • sqlspec/migrations/runner.py (+9 lines)
  • sqlspec/cli.py (+22 lines)

Documentation (4 files, +784 lines):

  • docs/guides/migrations/hybrid-versioning.md (new, 615 lines)
  • docs/usage/cli.rst (+109 lines)
  • docs/changelog.rst (+52 lines)
  • docs/guides/README.md (+8 lines)

Tests (7 files, +1573 lines):

  • tests/unit/test_migrations/test_version_conversion.py (new, 303 lines)
  • tests/unit/test_migrations/test_checksum_canonicalization.py (new, 412 lines)
  • tests/unit/test_migrations/test_fix_regex_precision.py (new, 339 lines)
  • tests/unit/test_migrations/test_tracker_idempotency.py (new, 324 lines)
  • tests/integration/test_migrations/test_fix_file_operations.py (new, 376 lines)
  • tests/integration/test_migrations/test_fix_checksum_stability.py (new, 90 lines)
  • tests/integration/test_migrations/test_fix_idempotency_workflow.py (new, 129 lines)

Total: 18 files changed, +2,926 lines


Quality Metrics

  • Tests: 248/248 passing (100%)
  • Coverage: 99%+ on core modules
  • Type Safety: Mypy strict + Pyright clean
  • Linting: Ruff check + format clean
  • Anti-patterns: None detected
  • Documentation: Comprehensive guide + CLI reference

Benefits

For Development Teams:

  • Zero PR conflicts on migration numbers
  • Parallel feature development without coordination
  • Clear chronological ordering of changes
  • Easy to understand migration history

For Production:

  • Deterministic sequential ordering
  • Predictable deployment behavior
  • Clean migration history
  • Standard numbered migrations

For CI/CD:

  • Automated conversion in pipeline
  • No manual intervention required
  • Idempotent operations (safe re-runs)
  • Database stays synchronized

cofin added a commit that referenced this pull request Oct 11, 2025
Update ADK migration documentation to reflect timestamp-based versioning
(YYYYMMDDHHmmss format) instead of sequential format.

Changes:
- Update version prefixing example (ext_adk_0001 → ext_adk_20251011120000)
- Update migration file examples to use timestamp format
- Remove references to sequential versioning (clean break)
- Clean up migration template path reference

This is a clean break documentation update - shows current state only,
no references to old sequential format.

Related: #116, #128
cofin added a commit that referenced this pull request Oct 11, 2025
Update ADK migration documentation to reflect timestamp-based versioning
(YYYYMMDDHHmmss format) instead of sequential format.

Changes:
- Update version prefixing example (ext_adk_0001 → ext_adk_20251011120000)
- Update migration file examples to use timestamp format
- Remove references to sequential versioning (clean break)
- Clean up migration template path reference

This is a clean break documentation update - shows current state only,
no references to old sequential format.

Related: #116, #128
@cofin cofin force-pushed the feat/hybrid-versioning branch from 15b9f29 to cb2f6b0 Compare October 11, 2025 22:02
@euri10
Copy link
Collaborator

euri10 commented Oct 12, 2025

@cofin I dont mind switching to time-based, maybe you want to implement hybrid in a subsequent PR ?

@cofin
Copy link
Member Author

cofin commented Oct 12, 2025

@cofin I dont mind switching to time-based, maybe you want to implement hybrid in a subsequent PR ?

no, i think the hybrid thing is needed here as well. Let me play around with this a bit.

cofin added 9 commits October 12, 2025 12:19
Implement timestamp-based migration versioning to eliminate PR conflicts
when multiple developers create migrations concurrently. Uses format
YYYYMMDDHHmmss_description.sql instead of sequential 0001, 0002.

Breaking Changes:
- Migration file format changed from sequential (0001) to timestamp
- Existing migrations must be recreated with new timestamp format
- Database schema adds version_type and execution_sequence columns
- No automated migration path (clean break for pre-1.0)

Features Added:
- Timestamp-based version generation (UTC timezone)
- Out-of-order migration detection with configurable strict mode
- Mixed format support (sequential + timestamp during transition)
- Version-aware sorting and comparison
- Enhanced migration tracking with execution sequence

Implementation:
- Created sqlspec.utils.version module for version parsing/comparison
- Created sqlspec.migrations.validation module for out-of-order detection
- Updated migration commands to generate timestamp versions
- Enhanced database schema with version_type and execution_sequence
- Added comprehensive test coverage (32 new tests, 152 total passing)

Configuration:
- strict_ordering: False (default) - warn on out-of-order migrations
- allow_missing parameter in upgrade() for per-command override

Files Changed:
- sqlspec/utils/version.py (NEW) - Version parsing system
- sqlspec/migrations/validation.py (NEW) - Out-of-order detection
- sqlspec/exceptions.py - Added migration exceptions
- sqlspec/migrations/base.py - Schema and sorting updates
- sqlspec/migrations/tracker.py - Record new columns
- sqlspec/migrations/commands.py - Timestamp generation and validation
- tests/unit/test_migrations/test_version.py (NEW) - 17 tests
- tests/unit/test_migrations/test_validation.py (NEW) - 15 tests

Refs: #116
- Remove unnecessary type annotation quotes
- Use list.extend for better performance
Add version_type and execution_sequence columns to Oracle-specific
migration tracker to match base schema changes.
- Add version_type and execution_sequence to record_migration methods
- Parse version to determine version_type
- Query next execution_sequence before recording
- Update both sync and async implementations
- All type checking errors resolved (pyright 0 errors)
Fix NEXT_SEQ column name case in Oracle migration tracker.
Oracle returns column names in uppercase, so query for 'NEXT_SEQ'
instead of 'next_seq'.

Both sync and async implementations updated.
Update ADK migration documentation to reflect timestamp-based versioning
(YYYYMMDDHHmmss format) instead of sequential format.

Changes:
- Update version prefixing example (ext_adk_0001 → ext_adk_20251011120000)
- Update migration file examples to use timestamp format
- Remove references to sequential versioning (clean break)
- Clean up migration template path reference

This is a clean break documentation update - shows current state only,
no references to old sequential format.

Related: #116, #128
Renamed `0001_create_session_table.py` to `20251011215440_create_session_table.py`
to align with the new hybrid versioning system implemented in #116.

This ensures consistency across all migrations in the codebase, including
extension-provided migrations. The Litestar extension's session table migration
now follows the same YYYYMMDDHHmmss_description.py naming convention as
project migrations.
Renamed `0001_create_adk_tables.py` to `20251011215914_create_adk_tables.py`
to align with the new hybrid versioning system implemented in #116.

This completes the migration of all existing migrations to the new timestamp
format, ensuring consistency across project and extension migrations.
@cofin cofin force-pushed the feat/hybrid-versioning branch from cb2f6b0 to 0dd6839 Compare October 12, 2025 16:19
cofin added 10 commits October 12, 2025 20:30
Complete implementation of goose-style hybrid versioning with fix command
that converts timestamp migrations to sequential format for production.

Features:
- Version conversion utilities with separate namespace support
- File operations layer with atomic backup/rollback
- Fix command with dry-run, interactive, and CI modes
- Database synchronization preserving metadata
- Comprehensive documentation and CI examples

Implementation Details:

Phase 0-1: Foundation ✅
- Extension migration audit and verification
- Version conversion functions (get_next_sequential_number, convert_to_sequential_version, generate_conversion_map)
- 27 unit tests with 100% coverage

Phase 2: File Operations ✅
- MigrationFixer class with atomic operations
- Timestamped backup creation (.backup_YYYYMMDD_HHMM/)
- SQL content transformation (query name updates)
- Automatic rollback on errors
- Integration tests

Phase 3: Fix Command ✅
- Sync and async fix() methods in commands.py
- Rich table preview output
- Interactive confirmation prompt
- CLI integration: sqlspec migration fix [--dry-run] [--yes] [--no-database]
- End-to-end integration tests

Phase 4: Database Sync ✅
- update_version_record() for sync and async trackers
- Preserves execution_sequence and applied_at
- Updates version_num and version_type
- Transaction-based with rollback support
- Multi-adapter testing

Phase 5: Documentation ✅
- Comprehensive workflow guide (docs/guides/migrations/hybrid-versioning.md)
- CLI reference in docs/usage/cli.rst
- CI integration examples (GitHub Actions, GitLab CI)
- Troubleshooting guide
- Updated changelog

Test Results:
- 174 migration tests passing (27 new unit tests)
- Code coverage: 93% overall, 100% for new code
- All type checks pass (mypy, pyright)
- All linting passes (ruff, slotscheck)

Closes #116
Extension migrations ship with the package and represent production
migrations, so they should use sequential format (0001) not timestamps.

This aligns with the hybrid versioning approach where:
- Development: timestamps (e.g., 20251011215440)
- Production: sequential (e.g., 0001)

Since extension migrations are bundled with the package, they are
production migrations and should follow sequential numbering.
Implements three tactical improvements identified by expert AI review:

1. **Make update_version_record idempotent**
   - Check if new_version already exists when rows_affected == 0
   - Allows fix command to be safely re-run after pulling changes
   - Prevents errors when developer's local DB is already updated
   - Applied to both sync and async tracker implementations

2. **Canonicalize checksum computation**
   - Exclude '-- name: migrate-*' headers from checksum calculation
   - Ensures checksums remain stable across version conversions
   - Prevents checksum drift warnings after fix command
   - Solves issue where file content changes break checksums

3. **Tighten regex in update_file_content**
   - Create version-specific patterns using re.escape(old_version)
   - Prevents unintended replacements of other migrate-* patterns
   - More precise and safer file content updates

These improvements address concerns raised during consensus review
with Gemini Pro and GPT-5, making the hybrid versioning approach
more robust for production CI/CD workflows.

All 196 migration tests passing.
Add 52 new tests covering all aspects of the hybrid versioning fix command:

Unit Tests (43 tests):
- test_checksum_canonicalization.py (16 tests)
  * Verify checksums exclude migrate-* headers
  * Ensure stability after version conversion
  * Test edge cases: empty files, whitespace, extensions

- test_fix_regex_precision.py (13 tests)
  * Verify version-specific regex replacement
  * Ensure only target version is replaced
  * Test special characters and boundary conditions

- test_tracker_idempotency.py (14 tests)
  * Verify idempotent update_version_record
  * Test both sync and async implementations
  * Cover success, idempotent, and error paths

Integration Tests (9 tests):
- test_fix_checksum_stability.py (3 tests)
  * Validate checksums unchanged after fix
  * Ensure validate command passes post-fix
  * Test with complex SQL and multiple migrations

- test_fix_idempotency_workflow.py (6 tests)
  * Test CI workflow simulation
  * Test developer pull workflow
  * Test partial conversion recovery

All 248 migration tests passing with 99%+ coverage on core modules.
Implements automatic version reconciliation in upgrade command to improve
developer workflow when pulling renamed migrations from teammates.

Key changes:
- Added _synchronize_version_records() to sync/async commands
- Auto-detects renamed migrations and updates DB tracking
- Validates checksums before updating
- Added --no-auto-sync CLI flag
- Added migration_config.auto_sync config option
- Defaults to enabled for best UX

Developers can now just run migrate after pulling changes without
manually running fix command.

Related: #128
- Add _synchronize_version_records() method to both sync and async commands
- Auto-detects renamed migrations and updates DB tracking before applying
- Validates checksums match before updating to prevent incorrect matches
- Add --no-auto-sync CLI flag to disable automatic reconciliation
- Add migration_config.auto_sync config option for project-wide control
- Defaults to enabled for best developer experience

This allows developers to just run 'migrate' after pulling renamed
migrations from teammates, without manually running 'fix' first.

Follows Option B (Pre-Migration Sync Check) as recommended by
expert consensus (Gemini Pro 9/10 confidence).

Related: #128
Update test assertions to expect auto_sync=True parameter in upgrade() calls.
All CLI tests now pass with the new auto-sync feature.
The guides/ directory contains internal documentation and playbooks that
should not be rendered in the Sphinx documentation. Added guides/** to
exclude_patterns and removed duplicate exclude_patterns definition.

Fixes warnings about guides/ files not being included in any toctree.
@cofin cofin changed the title feat: implement timestamp-based migration versioning feat: implement hybrid timestamp/sequence based migration versioning Oct 13, 2025
@cofin cofin linked an issue Oct 17, 2025 that may be closed by this pull request
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: duplicate loading ? Enhancement: migration, hybrid versioning

2 participants