Skip to content

Conversation

@el-feo
Copy link
Owner

@el-feo el-feo commented Nov 12, 2025

🚧 Work In Progress - Do Not Merge

This PR implements AWS STS temporary credentials to fix a fundamental bug in the current pre-signed URL approach.

The Problem

The current implementation has a critical bug:

# Current code in image_uploader.rb (line 191)
image_uri.path = "#{base_uri.path}page-#{index + 1}.png"

This modifies the path of a pre-signed PUT URL, which invalidates the signature because:

  • Pre-signed URLs include the exact object key in signature calculation
  • Changing the path after signing makes the signature invalid
  • The service cannot work with pre-signed destination URLs

The Solution: STS Credentials

Instead of pre-signed URLs, clients now:

  1. Assume an IAM role via STS to get temporary credentials (15-min expiration)
  2. Pass credentials + bucket/key to Lambda
  3. Lambda uses credentials to access S3 directly

Benefits

✅ Fixes the signature invalidation bug
✅ Works for any number of pages (no pre-signing each file)
✅ Client maintains access control (they scope IAM permissions)
✅ Time-limited credentials (15 minutes)
✅ Preflight validation ensures credentials work before processing


What's Implemented ✅

Phase 1: IAM Infrastructure

  • scripts/setup_iam_role.rb - Creates IAM role with proper trust policy
  • Role uses ExternalId for security
  • Scoped S3 permissions (GetObject source, PutObject destination)

Phase 2: Request Validation

  • New request format: source{bucket,key}, destination{bucket,prefix}, credentials{}
  • Validates S3 bucket names (AWS naming rules)
  • Validates S3 keys (length, .pdf extension)
  • Validates credential format (ASIA/AKIA prefix)

Phase 3: Core Components (Partial)

  • CredentialSanitizer - Prevents credential leakage in logs
  • PdfDownloader refactored to use S3 SDK
    • Preflight check with HEAD object (validates access before download)
    • File size validation (max 100MB)
    • PDF magic number validation
    • Detailed error messages for permission issues

Still TODO 🚧

Phase 3 Completion

  • Refactor ImageUploader to use S3 SDK (upload_images_to_s3 method)
  • Update app.rb main handler to wire new components
  • Update Gemfile to add aws-sdk-s3 gem

Phase 4: Testing Scripts

  • Create scripts/generate_sts_credentials.rb
  • Update existing test scripts for new format
  • Remove old pre-signed URL scripts

Phase 5: Documentation

  • Update README.md API specification
  • Update CLAUDE.md
  • Create STS setup guide
  • Update Getting Started instructions

Phase 6: Testing

  • Update unit tests for new request format
  • Update RequestValidator tests
  • Add PdfDownloader S3 tests
  • Add ImageUploader S3 tests
  • Integration tests with LocalStack/real AWS
  • Test credential validation
  • Test preflight checks

Phase 7: Cleanup

  • Remove old URL-based code paths
  • Remove url_validator.rb (no longer needed)
  • Remove retry_handler.rb (S3 SDK has built-in retries)
  • Update webhook response to use S3 URLs

New API Request Format

{
  "source": {
    "bucket": "my-input-bucket",
    "key": "pdfs/document.pdf"
  },
  "destination": {
    "bucket": "my-output-bucket",  
    "prefix": "converted/project-123/"
  },
  "credentials": {
    "accessKeyId": "ASIAIOSFODNN7EXAMPLE",
    "secretAccessKey": "wJalrXUtnFEMI/K7MDENG/...",
    "sessionToken": "FQoGZXIvYXdzEPT//////////..."
  },
  "unique_id": "client-123",
  "webhook": "https://example.com/webhook"
}

Client Workflow

  1. Setup (one-time):

    ./scripts/setup_iam_role.rb \
      --source-bucket my-pdfs \
      --dest-bucket my-images
  2. Each request:

    # Generate temporary credentials
    ./scripts/generate_sts_credentials.rb \
      --role-arn arn:aws:iam::123456789012:role/PdfConverterClientRole \
      --source-bucket my-pdfs \
      --dest-bucket my-images
    
    # Use credentials in API request
    curl -X POST https://api.../convert \
      -H "Authorization: Bearer $JWT_TOKEN" \
      -d @request.json

Security Enhancements

  • 15-minute credential expiration enforced
  • Preflight validation - credentials tested before processing
  • Credential sanitization - all logs redact secrets
  • ExternalId required - prevents confused deputy problem
  • Session policies - clients can further scope permissions

Implementation Plan

See full plan: .agent-os/specs/2025-11-11-sts-credentials/plan.md

Estimated remaining time: 10-12 hours

  • Phase 3 completion: 2 hours
  • Phase 4 scripts: 2 hours
  • Phase 5 docs: 3 hours
  • Phase 6 testing: 4 hours
  • Phase 7 cleanup: 1 hour

Review Checklist

Before marking ready for review:

  • All TODO items completed
  • Unit tests pass
  • Integration tests pass
  • Documentation updated
  • Old code removed
  • Security review (credential handling)
  • Tested end-to-end with real AWS

🤖 Generated with Claude Code

el-feo and others added 2 commits November 11, 2025 18:48
This is a work-in-progress implementation to fix the fundamental bug where
pre-signed URL paths were being modified, invalidating signatures.

## What's Implemented

### Phase 1: IAM Infrastructure
- Created `scripts/setup_iam_role.rb` for IAM role setup
- Role allows clients to assume with ExternalId for security
- Scoped permissions for source (GetObject) and destination (PutObject)

### Phase 2: Request Validation
- Updated RequestValidator to handle new STS format
- Validates source/destination as bucket+key instead of URLs
- Validates credentials object (accessKeyId, secretAccessKey, sessionToken)
- Added S3 bucket name and key format validation
- Enforces .pdf extension on source files

### Phase 3: Core Components (Partial)
- **CredentialSanitizer**: Safely logs credentials without exposing secrets
  - Masks secretAccessKey completely
  - Shows first/last 4 chars of accessKeyId and sessionToken

- **PdfDownloader**: Refactored to use S3 SDK
  - Downloads from S3 using temporary credentials
  - Performs preflight check (HEAD object) to validate access
  - Validates file size (max 100MB) before downloading
  - Validates PDF magic numbers
  - Supports both STS credentials and Lambda IAM role

### Architecture Documentation
- Comprehensive implementation plan in `.agent-os/specs/2025-11-11-sts-credentials/plan.md`
- 7-phase implementation strategy
- Timeline: ~18 hours total

## New Request Format

```json
{
  "source": {
    "bucket": "my-bucket",
    "key": "pdfs/document.pdf"
  },
  "destination": {
    "bucket": "output-bucket",
    "prefix": "converted/"
  },
  "credentials": {
    "accessKeyId": "ASIA...",
    "secretAccessKey": "...",
    "sessionToken": "..."
  },
  "unique_id": "test-123",
  "webhook": "https://example.com/webhook"
}
```

## Still TODO

- [ ] Refactor ImageUploader to use S3 SDK
- [ ] Update main handler (app.rb) to wire new components
- [ ] Update Gemfile to add aws-sdk-s3 dependency
- [ ] Create STS credential generator script
- [ ] Update testing scripts
- [ ] Update documentation (README, API spec)
- [ ] Write/update unit tests
- [ ] Integration testing
- [ ] Remove old pre-signed URL code

## Security Enhancements

- Credential preflight validation (15-min expiration enforced)
- All credentials sanitized in logs
- Access validation before downloading (prevents wasted bandwidth)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
This commit completes the migration from pre-signed URLs to AWS STS temporary credentials for S3 access.

## Core Implementation (Phase 3)

- **ImageUploader**: Complete rewrite to use S3 SDK
  - Changed from HTTP PUT to s3_client.put_object
  - New method: upload_images_to_s3(bucket, prefix, unique_id, images)
  - Supports both client credentials and Lambda IAM role
  - Credential sanitization in logs

- **PdfDownloader**: (from Phase 2, updated)
  - Uses S3 SDK with preflight validation
  - HEAD object check before downloading
  - Validates PDF magic numbers
  - Credential sanitization in logs

- **app.rb**: Updated main handler
  - Passes credentials to PdfDownloader and ImageUploader
  - Extracts bucket/key/prefix from request body
  - Builds S3 URLs for response
  - Removed URL-based dependencies

- **Gemfile**: Moved aws-sdk-s3 to production dependencies
  - Removed async gem (no longer needed)

## Testing Scripts (Phase 4)

- **generate_sts_credentials.rb**: New script to generate temporary credentials
  - Assumes IAM role with ExternalId
  - 15-minute credential expiration
  - Multiple output formats (pretty, json, curl)
  - Uses bundler/inline for dependencies

- **scripts/README.md**: Updated documentation
  - Complete workflow with STS credentials
  - Removed pre-signed URL references
  - Added troubleshooting section

## Documentation (Phase 5)

- **README.md**: Updated API specification
  - New request format with bucket/key/prefix and credentials
  - Security model explanation
  - Updated Getting Started workflow
  - New Step 6: IAM role setup

- **CLAUDE.md**: Updated for STS approach
  - New API request/response format
  - Security model highlights

- **docs/sts-setup.md**: Comprehensive STS setup guide
  - Why STS credentials
  - Detailed setup instructions
  - Code examples (Ruby, Python, Node.js)
  - Troubleshooting section
  - Security best practices

## Cleanup (Phase 7)

Removed deprecated files:
- scripts/generate_presigned_urls.rb (replaced by generate_sts_credentials.rb)
- pdf_converter/app/url_validator.rb (no longer needed)
- pdf_converter/lib/retry_handler.rb (S3 SDK has built-in retries)
- pdf_converter/lib/s3_url_parser.rb (no longer parsing URLs)
- pdf_converter/lib/url_utils.rb (no longer needed)

## Test Status (Phase 6)

Test suite shows 161 failures due to API changes. Tests need updates:
- request_validator_spec.rb: New validation logic for bucket/key/credentials
- pdf_downloader_spec.rb: S3 SDK instead of HTTP
- image_uploader_spec.rb: Complete API rewrite
- app_spec.rb: New request format and mocking

Tests will be updated in follow-up commits.

## Summary

This implementation provides enhanced security through:
- Time-limited credentials (15 minutes)
- Scoped IAM permissions
- No long-term credential storage
- Client-controlled access
- Preflight validation
- Full audit trail

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@el-feo
Copy link
Owner Author

el-feo commented Nov 19, 2025

Closing for a different approach.

@el-feo el-feo closed this Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants