-
Notifications
You must be signed in to change notification settings - Fork 321
Description
π§ Chore Summary β Enterprise-Scale Scalability & Soak-Test Harness
Introduce a production-realistic test harness that validates Gateway performance under massive enterprise-scale load with tiered wave testing from small deployments to million-user enterprises:
- Tiered dataset seeding across 4 waves: Small β Medium β Large β Enterprise (up to 1M users, 5M teams, 50K tools)
- Multi-layer load testing using Locust (HTTP API), pytest-benchmark (service layer), and smocker (mocked MCP servers)
- Federation & caching stress testing with L1/L2 cache validation under extreme load
- Multi-tenancy scale testing validating private/team/global scope performance with millions of entities
- Comprehensive reporting with Grafana dashboards, flamegraphs, and enterprise capacity planning
π Wave Matrix & Target Datasets
Wave | Servers | Tools | Users | Teams | Metrics Retention | Max Users / Team | Load Test Duration |
---|---|---|---|---|---|---|---|
Small | 100 | 500 | 10,000 | 50,000 | 90 days | 50,000 | 15 minutes |
Medium | 1,000 | 2,500 | 100,000 | 500,000 | 1 year | 100,000 | 30 minutes |
Large | 5,000 | 12,500 | 500,000 | 2,500,000 | 3 years | 500,000 | 60 minutes |
Enterprise | 10,000 | 50,000 | 1,000,000 | 5,000,000 | 5 years | 1,000,000 | 120 minutes |
Enterprise Stability | 10,000 | 50,000 | 1,000,000 | 5,000,000 | 5 years | 1,000,000 | 48 hours |
Rule of thumb: Teams β Users x 5 (every user owns 1 private team + belongs to ~4 shared teams)
A 48-hour stability test will run to assess potential memory leaks and infrastructure stability.
π§± Areas Affected
- Make targets β
make seed-{wave}
,make soak-test-{wave}
,make federation-load-enterprise
,make flamegraph-analysis
- CI / GitHub Actions β nightly wave testing with matrix (PostgreSQL/Redis, caching on/off)
- Docker Compose β Locust cluster, smocker mock services, Redis cluster, PostgreSQL tuning
- Test infrastructure β Enterprise data seeder, realistic user scenarios, federation mocking at scale
- Monitoring β Load-test Grafana dashboard, cache performance metrics, federation health, memory tracking
- Documentation β Comprehensive scalability guide with enterprise capacity planning
βοΈ Context / Rationale
Current performance unknowns that block enterprise deployment decisions:
Critical Question | Today | After This Epic |
---|---|---|
Million-user multi-tenancy performance? | β Unknown | π Query performance across 5M teams |
50K tools federation latency patterns? | β Unknown | π Network overhead, cache efficiency |
L1/L2 cache behavior with 100GB+ datasets? | β Unknown | π Memory usage, eviction patterns |
Database scaling to 500M+ records? | β Unknown | π Connection pooling, query optimization |
Memory leak patterns over 48h at enterprise scale? | β Unknown | π§ RSS trends, GC patterns, cache bloat |
Real-world enterprise scenarios tested:
- Global enterprise: 50,000 tools across 5,000,000 teams
- Federation mesh of 100+ external MCP gateways with realistic failure rates
- 2,000+ concurrent API clients with mixed workloads (read-heavy, write-heavy, federation)
- Cache warming with 10GB+ datasets, invalidation storms, federation failover
π Enhanced Architecture Design
Enterprise Test Environment Stack:
flowchart TD
%% Load Generation Cluster
subgraph "Load Generation Cluster"
LOCUST[Locust Controller<br/>Web UI :8089]
L1[Locust Worker 1<br/>Enterprise API Load]
L2[Locust Worker 2<br/>Federation Load]
L3[Locust Worker 3<br/>Multi-tenant Load]
L4[Locust Worker 4<br/>Cache Stress Load]
L5[Locust Worker 5<br/>Write-Heavy Load]
LOCUST --> L1
LOCUST --> L2
LOCUST --> L3
LOCUST --> L4
LOCUST --> L5
end
%% Mock Services Federation
subgraph "Mock Services Federation"
SMOCKER[Smocker Controller<br/>:8080]
MOCK1[Mock MCP Cluster 1<br/>Tools Provider x20]
MOCK2[Mock MCP Cluster 2<br/>Resources Provider x20]
MOCK3[Mock Gateway Fed<br/>Federated Peers x50]
MOCK4[Mock Enterprise APIs<br/>External Systems x10]
SMOCKER --> MOCK1
SMOCKER --> MOCK2
SMOCKER --> MOCK3
SMOCKER --> MOCK4
end
%% Gateway Cluster Under Test
subgraph "Gateway Cluster Under Test"
LB[Load Balancer<br/>nginx :4444]
GW1[Gateway Instance 1<br/>+L1 Cache 1GB]
GW2[Gateway Instance 2<br/>+L1 Cache 1GB]
GW3[Gateway Instance 3<br/>+L1 Cache 1GB]
GW4[Gateway Instance 4<br/>+L1 Cache 1GB]
LB --> GW1
LB --> GW2
LB --> GW3
LB --> GW4
end
%% Data Layer - Enterprise Scale
subgraph "Data Layer - Enterprise Scale"
REDIS_M[Redis Master<br/>L2 Cache + Sessions]
REDIS_S1[Redis Slave 1<br/>Read Replica]
REDIS_S2[Redis Slave 2<br/>Read Replica]
PG_M[(PostgreSQL Master<br/>50GB+ Primary Data)]
PG_S1[(PostgreSQL Slave 1<br/>Read Replica)]
PG_S2[(PostgreSQL Slave 2<br/>Read Replica)]
REDIS_M --> REDIS_S1
REDIS_M --> REDIS_S2
PG_M --> PG_S1
PG_M --> PG_S2
GW1 --> REDIS_M
GW2 --> REDIS_M
GW3 --> REDIS_M
GW4 --> REDIS_M
GW1 --> PG_M
GW2 --> PG_M
GW3 --> PG_M
GW4 --> PG_M
end
%% Monitoring & Analysis
subgraph "Monitoring & Analysis"
PROM[Prometheus<br/>High-Resolution Metrics]
GRAF[Grafana<br/>Enterprise Dashboard]
PYSPY[py-spy Cluster<br/>Distributed Profiling]
ELASTIC[Elasticsearch<br/>Log Aggregation]
PROM --> GRAF
GW1 --> PROM
GW2 --> PROM
GW3 --> PROM
GW4 --> PROM
GW1 --> ELASTIC
GW2 --> ELASTIC
GW3 --> ELASTIC
GW4 --> ELASTIC
end
%% Connections
L1 --> LB
L2 --> LB
L3 --> LB
L4 --> LB
L5 --> LB
GW1 --> MOCK1
GW1 --> MOCK2
GW1 --> MOCK3
GW1 --> MOCK4
GW2 --> MOCK1
GW2 --> MOCK2
GW2 --> MOCK3
GW2 --> MOCK4
GW3 --> MOCK1
GW3 --> MOCK2
GW3 --> MOCK3
GW3 --> MOCK4
GW4 --> MOCK1
GW4 --> MOCK2
GW4 --> MOCK3
GW4 --> MOCK4
classDef load fill:#ffeb3b
classDef mock fill:#4caf50
classDef gateway fill:#2196f3
classDef data fill:#ff9800
classDef monitor fill:#9c27b0
class LOCUST,L1,L2,L3,L4,L5 load
class SMOCKER,MOCK1,MOCK2,MOCK3,MOCK4 mock
class LB,GW1,GW2,GW3,GW4 gateway
class REDIS_M,REDIS_S1,REDIS_S2,PG_M,PG_S1,PG_S2 data
class PROM,GRAF,PYSPY,ELASTIC monitor
π Enhanced Acceptance Criteria
# | Criteria | Validation Method |
---|---|---|
1 | Enterprise dataset: 50K tools, 10K servers, 5M teams, 1M users, 5 years metrics | make seed-enterprise completes < 60 min |
2 | Wave testing: All 4 waves (SmallβEnterprise) with progressive load increase | make soak-test-all-waves runs 4-hour test cycle |
3 | Cache performance at scale: L1/L2 with 10GB+ datasets, 95%+ hit ratio | Grafana shows cache metrics under enterprise load |
4 | Federation stress test: 100+ mocked external gateways with realistic patterns | Smocker validates federated call patterns at scale |
5 | Multi-tenancy at scale: 5M teams query performance, scope isolation | Benchmark report shows <10% overhead vs single-tenant |
6 | CI integration: Nightly wave testing with enterprise-scale matrix | GitHub Actions uploads comprehensive reports + flamegraphs |
π οΈ Comprehensive Task List
Phase 1: Enterprise Data Seeder with Wave System
-
1.1 Wave-based seeder script
scripts/seed_enterprise.py
# Enterprise-scale realistic data generation with wave system WAVE_CONFIGS = { 'small': { 'servers': 100, 'tools': 500, 'users': 10_000, 'teams': 50_000, 'metrics_days': 90, 'max_team_size': 50_000 }, 'medium': { 'servers': 1_000, 'tools': 2_500, 'users': 100_000, 'teams': 500_000, 'metrics_days': 365, 'max_team_size': 100_000 }, 'large': { 'servers': 5_000, 'tools': 12_500, 'users': 500_000, 'teams': 2_500_000, 'metrics_days': 1095, 'max_team_size': 500_000 }, 'enterprise': { 'servers': 10_000, 'tools': 50_000, 'users': 1_000_000, 'teams': 5_000_000, 'metrics_days': 1825, 'max_team_size': 1_000_000 } } @click.command() @click.option('--wave', type=click.Choice(['small', 'medium', 'large', 'enterprise']), default='small', help='Scale wave to generate') @click.option('--parallel-workers', default=8, help='Parallel workers for data generation') @click.option('--batch-size', default=10000, help='Batch size for bulk operations') def seed_wave(wave: str, parallel_workers: int, batch_size: int): """Generate enterprise-scale test data for specified wave""" config = WAVE_CONFIGS[wave] logger.info(f"π Seeding {wave} wave: {config}") asyncio.run(generate_wave_data(config, parallel_workers, batch_size))
-
1.2 Enterprise data patterns with realistic distributions
# Realistic enterprise distribution patterns ENTERPRISE_PATTERNS = { 'team_size_distribution': { 'micro': (1, 5, 0.6), # 60% micro teams (1-5 users) 'small': (6, 25, 0.25), # 25% small teams (6-25 users) 'medium': (26, 100, 0.10), # 10% medium teams (26-100 users) 'large': (101, 1000, 0.04), # 4% large teams (101-1000 users) 'enterprise': (1001, 1_000_000, 0.01) # 1% enterprise teams (1001+ users) }, 'tools_per_team_distribution': { 'light': (1, 10, 0.5), # 50% teams: 1-10 tools 'moderate': (11, 50, 0.3), # 30% teams: 11-50 tools 'heavy': (51, 200, 0.15), # 15% teams: 51-200 tools 'power': (201, 1000, 0.05) # 5% teams: 201-1000 tools }, 'federation_patterns': { 'hub_spoke': 0.4, # 40% use hub-spoke federation 'full_mesh': 0.3, # 30% use full-mesh federation 'tiered': 0.2, # 20% use tiered federation 'isolated': 0.1 # 10% no federation }, 'cache_access_patterns': { 'hot_data': 0.8, # 80% requests hit hot data (5% of total) 'warm_data': 0.15, # 15% requests hit warm data (15% of total) 'cold_data': 0.05 # 5% requests hit cold data (80% of total) } }
-
1.3 High-performance bulk loading for enterprise scale
# Optimized bulk loading for million-scale datasets async def bulk_load_enterprise_wave(config: dict, workers: int, batch_size: int): """High-performance parallel bulk loading""" # Phase 1: Generate core entities in parallel async with asyncio.TaskGroup() as tg: tg.create_task(bulk_generate_users(config['users'], workers, batch_size)) tg.create_task(bulk_generate_teams(config['teams'], workers, batch_size)) tg.create_task(bulk_generate_tools(config['tools'], workers, batch_size)) tg.create_task(bulk_generate_servers(config['servers'], workers, batch_size)) # Phase 2: Generate relationships with realistic patterns await bulk_generate_team_memberships(config, batch_size=50_000) await bulk_generate_tool_associations(config, batch_size=100_000) # Phase 3: Generate historical metrics (most expensive) await bulk_generate_enterprise_metrics( config['metrics_days'], parallel_months=12, # Generate 12 months in parallel batch_size=1_000_000 # 1M metrics per batch ) # Phase 4: Generate federation mesh await bulk_generate_federation_topology(config, pattern='enterprise')
Phase 2: Smocker Integration for Enterprise Federation Testing
-
2.1 Enterprise MCP federation mocking
docker-compose.enterprise.yml
# Enterprise-scale federation testing smocker: image: thiht/smocker:latest ports: - "8080:8080" - "8081:8081" environment: - SMOCKER_MAX_MOCKS=10000 # Support 10K mock endpoints - SMOCKER_MEMORY_LIMIT=4GB # Handle large mock datasets volumes: - ./loadtest/mocks:/opt/mocks # Mock MCP federation cluster (100 gateways) mock-federation-cluster: image: thiht/smocker:latest deploy: replicas: 20 # 20 smocker instances environment: - SMOCKER_FEDERATION_SIZE=100 - SMOCKER_LATENCY_RANGE=50-2000ms - SMOCKER_RELIABILITY=0.95-0.999 volumes: - ./loadtest/mocks/federation-enterprise.yml:/opt/mocks/config.yml # Mock enterprise external APIs mock-enterprise-apis: image: thiht/smocker:latest deploy: replicas: 5 environment: - SMOCKER_ENTERPRISE_APIS=true - SMOCKER_RATE_LIMIT=1000rps volumes: - ./loadtest/mocks/enterprise-apis.yml:/opt/mocks/config.yml
-
2.2 Enterprise federation scenario mocks
loadtest/mocks/federation-enterprise.yml
# Enterprise federation patterns with realistic failure modes - request: method: POST path: /v1/tools headers: x-gateway-region: "us-east" response: status: 200 delay: 75ms # US East latency body: | { "tools": {{range 1000}} {"name": "enterprise_tool_{{.}}", "description": "Enterprise tool {{.}}"}, {{end}} } # Regional latency simulation - request: method: POST path: /v1/tools headers: x-gateway-region: "eu-central" response: status: 200 delay: 150ms # EU Central latency # Failure scenarios (5% failure rate) - request: method: POST path: /v1/tools headers: x-test-scenario: "partial_outage" response: status: 503 delay: 30s body: | {"error": "Gateway temporarily unavailable", "retry_after": 30} # Large dataset responses (cache stress testing) - request: method: POST path: /v1/federation/bulk response: status: 200 delay: 500ms body: | { "tools": {{range 10000}} {"name": "bulk_tool_{{.}}", "size": "{{multiply . 1024}}"}, {{end}} }
-
2.3 Dynamic enterprise mock management
# Enterprise-scale mock management class EnterpriseMockManager: async def setup_enterprise_federation(self, gateway_count: int = 100): """Setup enterprise-scale federated gateway mocks""" # Create regional clusters regions = ['us-east', 'us-west', 'eu-central', 'eu-west', 'ap-south', 'ap-east'] gateways_per_region = gateway_count // len(regions) for region in regions: await self.create_regional_cluster( region=region, gateway_count=gateways_per_region, base_latency=self.get_regional_latency(region), reliability=random.uniform(0.95, 0.999) ) async def simulate_enterprise_failure_patterns(self): """Simulate realistic enterprise failure patterns""" failure_scenarios = [ {'type': 'regional_outage', 'probability': 0.01, 'duration': '15m'}, {'type': 'high_latency_spike', 'probability': 0.05, 'duration': '2m'}, {'type': 'rate_limit_exceeded', 'probability': 0.02, 'duration': '30s'}, {'type': 'partial_data_corruption', 'probability': 0.001, 'duration': '5m'} ] for scenario in failure_scenarios: if random.random() < scenario['probability']: await self.trigger_failure_scenario(scenario)
Phase 3: Enterprise Load Testing Scenarios
-
3.1 Enterprise Locust scenarios
locustfiles/enterprise_scale.py
# Enterprise user behavior patterns class EnterpriseUserBehavior(HttpUser): weight = 40 # Most common user type def on_start(self): """Initialize enterprise user context""" self.user_id = f"user_{random.randint(1, 1_000_000)}" self.private_team_id = f"private_{self.user_id}" self.shared_teams = random.sample(range(1, 5_000_000), k=random.randint(1, 8)) self.tool_cache = [] @task(25) def list_my_team_tools(self): """Most frequent: list tools for my primary team""" team_id = random.choice(self.shared_teams) response = self.client.get(f"/v1/tools?scope=team:{team_id}&limit=50") if response.status_code == 200: self.tool_cache = response.json().get('tools', [])[:10] @task(15) def search_global_tools(self): """Search across global tool catalog""" query = random.choice(['weather', 'translate', 'calculate', 'format', 'analyze']) self.client.get(f"/v1/tools/search?q={query}&scope=global&limit=100") @task(10) def access_federated_tools(self): """Access tools from federated gateways""" self.client.get(f"/v1/federation/tools?regions=us-east,eu-central&limit=50") @task(8) def create_private_tool(self): """Create tool in private workspace""" tool_data = self.generate_enterprise_tool() response = self.client.post(f"/v1/tools", json=tool_data) if response.status_code == 201: tool_id = response.json()['id'] self.tool_cache.append(tool_id) @task(5) def share_tool_to_team(self): """Share private tool to team (triggers cache invalidation)""" if self.tool_cache: tool_id = random.choice(self.tool_cache) team_id = random.choice(self.shared_teams) self.client.post(f"/v1/tools/{tool_id}/share", json={"scope": f"team:{team_id}"}) @task(2) def bulk_operations(self): """Bulk operations that stress the system""" team_id = random.choice(self.shared_teams) self.client.post(f"/v1/tools/bulk", json={"team_id": team_id, "action": "export", "limit": 1000}) class EnterprisePowerUser(HttpUser): weight = 10 # Power users with heavy operations @task(15) def complex_federation_query(self): """Complex queries across multiple federated gateways""" self.client.get("/v1/federation/aggregate?regions=all&include_metrics=true&timeframe=30d") @task(10) def team_administration(self): """Team management operations""" team_id = random.randint(1, 5_000_000) self.client.get(f"/v1/teams/{team_id}/members?limit=1000") self.client.get(f"/v1/teams/{team_id}/tools?include_private=true&limit=500") @task(8) def analytics_queries(self): """Heavy analytics queries""" self.client.get("/v1/analytics/usage?timeframe=90d&breakdown=team&limit=10000") class CacheStressBehavior(HttpUser): weight = 5 # Cache invalidation stress testing @task def cache_invalidation_storm(self): """Rapid create/update/delete to stress cache invalidation""" operations = [] # Create 50 tools rapidly for i in range(50): tool_data = {"name": f"stress_tool_{i}_{time.time()}", "url": "http://example.com"} response = self.client.post("/v1/tools", json=tool_data) if response.status_code == 201: operations.append(('created', response.json()['id'])) # Update them all for op_type, tool_id in operations: if op_type == 'created': self.client.put(f"/v1/tools/{tool_id}", json={"description": f"Updated at {time.time()}"}) operations.append(('updated', tool_id)) # Delete half of them for i, (op_type, tool_id) in enumerate(operations): if op_type == 'updated' and i % 2 == 0: self.client.delete(f"/v1/tools/{tool_id}")
-
3.2 Enterprise service-layer benchmarks
tests/bench/enterprise_performance.py
# Enterprise-scale service layer performance testing class TestEnterpriseServicePerformance: @pytest.mark.benchmark(group="enterprise_tool_service") def test_list_tools_million_scale(self, benchmark, enterprise_db_session): """Benchmark tool listing with 50K tools""" result = benchmark(tool_service.list_tools, enterprise_db_session, include_inactive=False) assert len(result) >= 45_000 # Should return most of 50K tools @pytest.mark.benchmark(group="enterprise_cache_performance") def test_cache_with_10gb_dataset(self, benchmark, enterprise_cache_manager): """Test cache performance with 10GB+ dataset""" large_data = {"tools": [{"id": i, "data": "x" * 1000} for i in range(100_000)]} def cache_operation(): return enterprise_cache_manager.get_or_set( "enterprise:large_dataset", lambda: large_data, ttl=3600 ) result = benchmark(cache_operation) assert len(result["tools"]) == 100_000 @pytest.mark.benchmark(group="enterprise_multi_tenancy") def test_scope_filtering_million_teams(self, benchmark, enterprise_db_session): """Test multi-tenant scope filtering with 5M teams""" user_context = { "user_id": "enterprise_user", "teams": [f"team_{i}" for i in range(1000)] # User in 1000 teams } result = benchmark(tool_service.list_tools_with_scope, enterprise_db_session, user_context) assert len(result) > 0 @pytest.mark.benchmark(group="enterprise_federation") def test_federation_aggregation_100_gateways(self, benchmark, mock_federation): """Test federation aggregation across 100 gateways""" gateway_urls = [f"http://mock-gateway-{i}:9000" for i in range(100)] def federation_operation(): return gateway_service.aggregate_federated_tools( gateway_urls, timeout=30, parallel_limit=20 ) result = benchmark(federation_operation) assert len(result) >= 50_000 # Should aggregate significant tools
Phase 4: Enhanced Enterprise Monitoring & Analysis
- 4.1 Enterprise load test Grafana dashboard
grafana/enterprise-loadtest.json
{ "dashboard": { "title": "MCP Gateway - Enterprise Load Test Analysis", "refresh": "5s", "time": {"from": "now-2h", "to": "now"}, "panels": [ { "title": "Request Rate by Wave Scale", "targets": [ {"expr": "rate(http_requests_total[5m]) by (wave_scale, endpoint)", "legendFormat": "{{wave_scale}} - {{endpoint}}"} ], "yAxes": [{"unit": "reqps", "max": 10000}] }, { "title": "Enterprise Cache Performance (L1/L2)", "targets": [ {"expr": "rate(cache_l1_hits_total[5m])", "legendFormat": "L1 Hits/sec"}, {"expr": "rate(cache_l2_hits_total[5m])", "legendFormat": "L2 Hits/sec"}, {"expr": "rate(cache_misses_total[5m])", "legendFormat": "Cache Misses/sec"}, {"expr": "cache_l1_memory_bytes / 1024 / 1024 / 1024", "legendFormat": "L1 Memory (GB)"} ] }, { "title": "Federation Latency (100+ Gateways)", "targets": [ {"expr": "histogram_quantile(0.50, rate(federation_request_duration_seconds_bucket[5m]))", "legendFormat": "P50"}, {"expr": "histogram_quantile(0.95, rate(federation_request_duration_seconds_bucket[5m]))", "legendFormat": "P95"}, {"expr": "histogram_quantile(0.99, rate(federation_request_duration_seconds_bucket[5m]))", "legendFormat": "P99"} ] }, { "title": "Multi-tenancy Query Performance", "targets": [ {"expr": "rate(db_query_duration_seconds_sum[5m]) / rate(db_query_duration_seconds_count[5m]) by (scope_type)", "legendFormat": "Avg {{scope_type}}"}, {"expr": "histogram_quantile(0.95, rate(db_query_duration_seconds_bucket[5m])) by (scope_type)", "legendFormat": "P95 {{scope_type}}"} ] }, { "title": "Memory Usage - Enterprise Scale", "targets": [ {"expr": "process_resident_memory_bytes / 1024 / 1024 / 1024", "legendFormat": "Gateway RSS (GB)"}, {"expr": "cache_l1_memory_bytes / 1024 / 1024 / 1024", "legendFormat": "L1 Cache (GB)"}, {"expr": "redis_used_memory_bytes / 1024 / 1024 / 1024", "legendFormat": "Redis (GB)"}, {"expr": "postgresql_shared_buffers_bytes / 1024 / 1024 / 1024", "legendFormat": "PostgreSQL (GB)"} ] }, { "title": "Database Connection Pool Usage", "targets": [ {"expr": "postgresql_connections_active", "legendFormat": "Active Connections"}, {"expr": "postgresql_connections_idle", "legendFormat": "Idle Connections"}, {"expr": "postgresql_connections_total", "legendFormat": "Total Connections"}, {"expr": "postgresql_max_connections", "legendFormat": "Max Connections"} ] } ] } }
Phase 5: Wave-Based CI Integration
- 5.1 Enterprise wave testing workflow
.github/workflows/enterprise-soak.yml
name: Enterprise Scale Soak Testing on: schedule: - cron: '0 2 * * 0' # Weekly Sunday 2 AM UTC workflow_dispatch: inputs: wave: description: 'Test wave to run' required: true default: 'small' type: choice options: - small - medium - large - enterprise duration_hours: description: 'Test duration in hours' default: '2' jobs: enterprise-soak: runs-on: ubuntu-latest-8-cores # Use 8-core runner for enterprise scale strategy: matrix: wave: [small, medium, large, enterprise] database: [postgresql] # Only PostgreSQL for enterprise scale cache: [enabled] # Always enable cache for enterprise federation: [true, false] fail-fast: false steps: - uses: actions/checkout@v4 - name: Setup enterprise test environment run: | # Increase system limits for enterprise testing echo "fs.file-max = 2097152" | sudo tee -a /etc/sysctl.conf echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf sudo sysctl -p # Start enterprise test stack docker-compose -f docker-compose.enterprise.yml up -d - name: Tune PostgreSQL for enterprise load run: | docker exec postgresql psql -U postgres -c " ALTER SYSTEM SET shared_buffers = '4GB'; ALTER SYSTEM SET work_mem = '256MB'; ALTER SYSTEM SET max_connections = 1000; ALTER SYSTEM SET effective_cache_size = '12GB'; SELECT pg_reload_conf(); " - name: Seed enterprise data timeout-minutes: 120 # 2 hours max for enterprise wave run: | make seed-${{ matrix.wave }} PARALLEL_WORKERS=16 BATCH_SIZE=50000 env: DATABASE_POOL_SIZE: 50 - name: Run enterprise soak test timeout-minutes: 480 # 8 hours max run: | make soak-test-${{ matrix.wave }} \ DURATION=${{ github.event.inputs.duration_hours || '2' }}h \ USERS=2000 SPAWN_RATE=50 \ FEDERATION_ENABLED=${{ matrix.federation }} env: DATABASE_TYPE: ${{ matrix.database }} CACHE_ENABLED: ${{ matrix.cache }} GUNICORN_WORKERS: 16 LOCUST_WORKERS: 8 - name: Capture enterprise flamegraph run: | make flamegraph-analysis DURATION=300 # 5 minutes - name: Generate enterprise capacity report run: | python scripts/generate_enterprise_report.py \ --wave ${{ matrix.wave }} \ --results reports/ \ --output reports/enterprise-capacity-${{ matrix.wave }}.html - name: Upload enterprise artifacts uses: actions/upload-artifact@v4 with: name: enterprise-soak-${{ matrix.wave }}-${{ matrix.federation && 'fed' || 'no-fed' }} retention-days: 30 path: | reports/soak-*.html reports/flamegraph-*.svg reports/enterprise-capacity-*.html reports/cache-analysis-${{ matrix.wave }}.json reports/federation-analysis-*.json
Phase 6: Enhanced Makefile Targets for Wave Testing
- 6.1 Wave-based make targets
# Wave-specific data seeding seed-small: @echo "π Seeding SMALL wave (10K users, 50K teams)..." @python scripts/seed_enterprise.py --wave small $(SEED_ARGS) @echo "β Small wave data ready" seed-medium: @echo "π Seeding MEDIUM wave (100K users, 500K teams)..." @python scripts/seed_enterprise.py --wave medium --parallel-workers 16 $(SEED_ARGS) @echo "β Medium wave data ready" seed-large: @echo "π Seeding LARGE wave (500K users, 2.5M teams)..." @python scripts/seed_enterprise.py --wave large --parallel-workers 32 $(SEED_ARGS) @echo "β Large wave data ready" seed-enterprise: @echo "π Seeding ENTERPRISE wave (1M users, 5M teams)..." @python scripts/seed_enterprise.py --wave enterprise --parallel-workers 64 \ --batch-size 100000 $(SEED_ARGS) @echo "β Enterprise wave data ready" # Wave-specific soak testing soak-test-small: seed-small @echo "π₯ SMALL wave soak test (15 min)..." @$(MAKE) _run_soak_test WAVE=small DURATION=15m USERS=100 soak-test-medium: seed-medium @echo "π₯ MEDIUM wave soak test (30 min)..." @$(MAKE) _run_soak_test WAVE=medium DURATION=30m USERS=500 soak-test-large: seed-large @echo "π₯ LARGE wave soak test (60 min)..." @$(MAKE) _run_soak_test WAVE=large DURATION=60m USERS=1500 soak-test-enterprise: seed-enterprise @echo "π₯ ENTERPRISE wave soak test (120 min)..." @$(MAKE) _run_soak_test WAVE=enterprise DURATION=120m USERS=2000 # Run all waves sequentially (for comprehensive testing) soak-test-all-waves: @echo "π Running ALL wave tests (4+ hours)..." @$(MAKE) soak-test-small @$(MAKE) soak-test-medium @$(MAKE) soak-test-large @$(MAKE) soak-test-enterprise @python scripts/generate_wave_comparison_report.py @echo "π All wave tests complete - see reports/wave-comparison.html" # Internal helper for running soak tests _run_soak_test: @docker-compose -f docker-compose.enterprise.yml up -d smocker @python scripts/setup_federation_mocks.py --wave $(WAVE) @locust -f locustfiles/enterprise_scale.py --headless \ --users $(USERS) --spawn-rate $(SPAWN_RATE) \ --run-time $(DURATION) --html reports/soak-$(WAVE)-$(shell date +%Y%m%d).html @pytest tests/bench/ --benchmark-only --benchmark-json reports/benchmark-$(WAVE).json @python scripts/generate_wave_report.py --wave $(WAVE)
Phase 7: Enterprise Documentation & Capacity Planning
-
7.1 Comprehensive enterprise guide
docs/testing/enterprise-scalability.md
# Enterprise-Scale Scalability Testing ## Wave System Overview Our testing uses a **4-wave system** to validate performance from small deployments to massive enterprises: | Wave | Scale | Use Case | Duration | |------|-------|----------|----------| | **Small** | 10K users, 50K teams | Department/Startup | 15 min | | **Medium** | 100K users, 500K teams | Mid-size Enterprise | 30 min | | **Large** | 500K users, 2.5M teams | Large Enterprise | 60 min | | **Enterprise** | 1M users, 5M teams | Global Enterprise | 120 min | ## Quick Start ```bash # Run specific wave make soak-test-enterprise USERS=2000 # Run all waves (4+ hours) make soak-test-all-waves # View results open reports/enterprise-capacity-enterprise.html open http://localhost:3000/d/enterprise-loadtest
Enterprise Capacity Planning Results
Performance Baselines (with L1+L2 caching)
Configuration Small Wave Medium Wave Large Wave Enterprise Wave Max RPS 500 1,200 2,500 4,000 P95 Latency 25ms 45ms 85ms 150ms Memory Usage 2GB 6GB 15GB 25GB DB Connections 20 50 150 300 Cache Hit Ratio 98% 96% 94% 92% Federation Performance
Federated Gateways Tool Aggregation Time Memory Overhead Failure Tolerance 10 gateways 150ms +500MB 2 failures 50 gateways 400ms +2GB 5 failures 100 gateways 800ms +4GB 10 failures Recommended Infrastructure
Enterprise Wave (1M users, 5M teams)
Gateway Cluster:
- 4-8 instances: 8 CPU, 32GB RAM each
- Load balancer with session affinity
- Auto-scaling based on CPU >70%
Database:
- PostgreSQL: 16 CPU, 128GB RAM, 1TB SSD
- Read replicas: 2-4 instances for read scaling
- Connection pooling: pgbouncer with 300 max connections
Cache Layer:
- Redis cluster: 3 masters, 3 replicas
- 64GB RAM per instance
- Memory eviction: allkeys-lru
Monitoring:
- Prometheus: 8 CPU, 64GB RAM, 500GB storage
- Grafana: 4 CPU, 16GB RAM
- Log aggregation: Elasticsearch cluster
π¦ Updated Deliverables
- Wave-based enterprise seeder:
scripts/seed_enterprise.py
with 4-tier scaling system - Enterprise smocker integration:
docker-compose.enterprise.yml
+ 100+ gateway mocks - Enterprise load scenarios:
locustfiles/enterprise_scale.py
with million-user patterns - Service benchmarks:
tests/bench/enterprise_performance.py
for all enterprise scales - Enterprise monitoring:
grafana/enterprise-loadtest.json
+ comprehensive alerts - Wave-based CI: Weekly enterprise testing with 8-hour test cycles
- Performance analysis: Distributed flamegraph capture + enterprise hotspot analysis
- Enterprise documentation: Complete capacity planning for 1M+ user deployments
π― Expected Enterprise Outcomes
Performance Baselines (Enterprise Wave with L1+L2 caching):
- 50K tool listing: <150ms P95 (vs 30+ seconds uncached)
- Federation mesh: 100 gateways aggregated in <800ms with intelligent caching
- Multi-tenancy: <15% query overhead for 5M team scope filtering
- Memory efficiency: L1 cache 4GB for 92% hit rate on enterprise datasets
Enterprise Capacity Planning Data:
- Safe production limits: 4,000 RPS sustained per 4-instance cluster
- Scale-out recommendations: Horizontal scaling patterns for 10M+ users
- Cache sizing: Memory requirements for different enterprise scales
- Federation limits: Maximum federated gateway count before timeout cascade
Infrastructure Recommendations:
- Detailed sizing for 1M+ user deployments
- Database sharding strategies for 10M+ teams
- Multi-region federation architecture
- Disaster recovery and failover procedures
π§© Additional Notes
- Enterprise-realistic patterns: Based on actual Fortune 500 SaaS usage data
- Federation at scale: Tests 100+ gateway mesh with realistic failure patterns
- Wave progression: Each wave 5-10x larger than previous for scaling validation
- Memory efficiency: L1/L2 cache tuned for enterprise dataset sizes (10GB+)
- CI scalability: Weekly enterprise tests with trend analysis over months
- Production readiness: Direct infrastructure sizing for million-user deployments
- Cost optimization: Capacity planning includes cost-per-user analysis for different configurations