A serverless PDF to image conversion service built with AWS SAM. This application provides secure, synchronous PDF processing with JWT authentication and optional webhook notifications using containerized Ruby Lambda functions.
- pdf_converter/ - Code for the PDF conversion Lambda function and Docker configuration
- app/ - Application modules (JWT authenticator, URL validator, PDF converter, etc.)
- spec/ - RSpec test suite for the application
- lib/ - Shared library code
- Dockerfile - Multi-stage Docker build configuration
- Gemfile - Ruby dependencies
- events/ - Sample invocation events for testing the function
- template.yaml - SAM template defining AWS resources
- samconfig.toml - SAM CLI deployment configuration
This guide walks you through deploying the PDF Converter Service to your AWS account from scratch.
- AWS CLI installed and configured with your AWS credentials
- SAM CLI installed
- Docker installed and running
- Ruby 3.4 (optional, for local development)
- An AWS account with permissions to create Lambda functions, API Gateway, ECR repositories, and Secrets Manager secrets
If you haven't already, configure the AWS CLI with your credentials:
aws configureEnter your AWS Access Key ID, Secret Access Key, default region (e.g., us-east-1), and output format (e.g., json).
The service uses JWT authentication. Create a secret in AWS Secrets Manager to store your JWT signing key:
# Generate a secure random secret (256-bit recommended)
SECRET_VALUE=$(openssl rand -base64 32)
# Create the secret in AWS Secrets Manager
aws secretsmanager create-secret \
--name pdf-converter/jwt-secret \
--secret-string "$SECRET_VALUE" \
--region us-east-1
# Save the secret value for later use in generating tokens
echo "Your JWT secret: $SECRET_VALUE"Important: Save the secret value securely - you'll need it to generate JWT tokens for API authentication.
Clone the repository and deploy using SAM:
# Clone the repository
git clone https://github.com/your-username/content_processing.git
cd content_processing
# Build the application
sam build
# Deploy (first time - this will prompt for configuration)
sam deploy --guidedDuring sam deploy --guided, you'll be prompted for:
- Stack Name: Press Enter to use default
content-processing - AWS Region: Enter your preferred region (e.g.,
us-east-1) - Confirm changes before deploy:
Y(recommended) - Allow SAM CLI IAM role creation:
Y(required) - Disable rollback:
N(recommended) - Save arguments to configuration file:
Y(saves settings for future deploys)
The deployment will:
- Create an ECR repository for the Docker image
- Build and push the container image
- Create the Lambda function
- Set up API Gateway with a
/convertendpoint - Configure IAM roles and permissions
After successful deployment, note the API endpoint URL from the outputs:
Outputs
-------------------------------------------------------------------
PdfConverterApi = https://xxxxxxxxxx.execute-api.us-east-1.amazonaws.com/Prod/convert/
To call the API, you need a valid JWT token. Here's how to generate one using Ruby:
require 'jwt'
# Use the secret you created in Step 2
secret = 'your-secret-from-step-2'
# Generate a token that expires in 1 hour
payload = {
sub: 'user-identifier',
exp: Time.now.to_i + 3600
}
token = JWT.encode(payload, secret, 'HS256')
puts "Authorization: Bearer #{token}"Or using Python:
import jwt
import time
# Use the secret you created in Step 2
secret = 'your-secret-from-step-2'
# Generate a token that expires in 1 hour
payload = {
'sub': 'user-identifier',
'exp': int(time.time()) + 3600
}
token = jwt.encode(payload, secret, algorithm='HS256')
print(f"Authorization: Bearer {token}")Create pre-signed S3 URLs for source (PDF) and destination (zip file), then call the API:
# Example using curl (replace with your actual URLs and token)
curl -X POST https://your-api-endpoint.amazonaws.com/Prod/convert \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"source": "https://s3.amazonaws.com/your-bucket/input.pdf?X-Amz-...",
"destination": "https://s3.amazonaws.com/your-bucket/output.zip?X-Amz-...",
"webhook": "https://your-webhook-endpoint.com/notify",
"unique_id": "test-123"
}'Note: The destination URL should be a pre-signed PUT URL for a .zip file, not a folder. The service will upload a single zip file containing all converted PNG images.
For instructions on generating pre-signed S3 URLs, see the AWS documentation.
To simplify testing, this repository includes utility scripts in the scripts/ directory. The scripts automatically install their dependencies on first run using bundler/inline - no manual gem installation needed!
Generate JWT Token:
./scripts/generate_jwt_token.rbGenerate Pre-signed S3 URLs:
./scripts/generate_presigned_urls.rb \
--bucket my-bucket \
--source-key pdfs/test.pdf \
--dest-prefix output/See scripts/README.md for detailed usage instructions and examples.
sam build # Build the Docker image and prepare for deployment
sam deploy # Deploy using saved configuration
sam deploy --guided # First-time deployment with promptssam local start-api # Run API locally on port 3000
sam local invoke PdfConverterFunction --event events/event.json # Test function with sample eventcd pdf_converter # Navigate to Lambda function directory
bundle install # Install dependencies including RSpec
bundle exec rspec # Run RSpec testsFor local integration testing with LocalStack (requires LocalStack to be running):
# Start LocalStack (requires Docker)
docker run --rm -d -p 4566:4566 -p 4571:4571 localstack/localstack
# Run LocalStack integration tests
cd pdf_converter
LOCALSTACK_ENDPOINT=http://localhost:4566 \
AWS_ENDPOINT_URL=http://localhost:4566 \
AWS_REGION=us-east-1 \
bundle exec rspec spec/integration/localstack_integration_spec.rb --format documentationLocalStack provides a local AWS cloud stack for testing AWS services without incurring costs or requiring AWS credentials.
sam logs -n PdfConverterFunction --stack-name content_processing --tail # View Lambda logssam delete --stack-name content_processing # Delete the deployed stackConverts a PDF to images and delivers them as a zip file.
Request Body:
{
"source": "https://s3.amazonaws.com/bucket/input.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...",
"destination": "https://s3.amazonaws.com/bucket/output.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...",
"webhook": "https://example.com/webhook",
"unique_id": "client-123"
}Important: Both source and destination URLs must be pre-signed S3 URLs. Pre-signed URLs provide:
- Enhanced security: No AWS credentials are exposed in the Lambda function
- Fine-grained access control: URLs have time-limited access and specific permissions (GET for source, PUT for destination)
- Client control: Clients generate URLs with their own AWS credentials, maintaining data sovereignty
- Audit trail: All S3 access is logged under the client's AWS account
Note on destination URL: The destination URL should be a pre-signed PUT URL for a zip file (e.g., output.zip), not a folder path. The service will create a zip file containing all converted images.
Response:
{
"message": "PDF conversion and zip upload completed",
"images": "https://s3.amazonaws.com/bucket/output.zip",
"unique_id": "client-123",
"status": "completed",
"pages_converted": 2,
"metadata": {
"pdf_page_count": 2,
"conversion_dpi": 300,
"image_format": "png"
}
}Zip File Contents: The zip file contains PNG images named as {unique_id}-0.png, {unique_id}-1.png, etc., corresponding to each page of the PDF.
Note: The service processes PDFs synchronously and returns the zip file URL in the response. If a webhook URL is provided, a notification is also sent asynchronously (fire-and-forget) upon completion.
The application follows AWS SAM patterns with containerized Ruby Lambda functions:
- Lambda Function: 2048 MB memory, 60-second timeout, Ruby 3.4 runtime
- Authentication: JWT-based authentication using AWS Secrets Manager
- Packaging: Container-based deployment using multi-stage Docker builds
- API: REST API via API Gateway with
/convertendpoint
The Lambda function uses these environment variables:
JWT_SECRET_NAME: Name of the secret in AWS Secrets Manager (defaults to pdf-converter/jwt-secret)CONVERSION_DPI: DPI resolution for PDF to image conversion (default: 300)PNG_COMPRESSION: PNG compression level 0-9 (default: 6)MAX_PAGES: Maximum number of pages allowed per PDF (default: 500)VIPS_WARNING: Controls libvips warning output (default: 0)AWS_REGION: AWS region for Secrets Manager (set by Lambda runtime, typically us-east-1)
- jwt (~> 2.7): JSON Web Token implementation for authentication
- aws-sdk-secretsmanager (~> 1): AWS SDK for secure key retrieval
- json (~> 2.9): JSON parsing and generation
- ruby-vips (~> 2.2): Ruby bindings for libvips image processing library
- rubyzip (~> 2.3): Zip file creation and manipulation
- async (~> 2.6): Asynchronous processing for batch uploads
- rspec (~> 3.12): Testing framework
- webmock (~> 3.19): HTTP request stubbing for tests
- aws-sdk-s3 (~> 1): AWS S3 SDK for integration tests
- simplecov (~> 0.22): Code coverage analysis
- rubocop (~> 1.81): Ruby code linter and formatter
- rubycritic (~> 4.9): Code quality analysis tool