aws-samples · ycadosh · Sep 10, 2025 · Sep 10, 2025 · Sep 10, 2025
diff --git a/LICENSE b/LICENSE
@@ -1,17 +1,21 @@
-MIT No Attribution
+MIT License
 
-Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
+Copyright (c) 2025 Pipecat Voice AI Agent AWS Deployment
 
-Permission is hereby granted, free of charge, to any person obtaining a copy of
-this software and associated documentation files (the "Software"), to deal in
-the Software without restriction, including without limitation the rights to
-use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
-the Software, and to permit persons to whom the Software is furnished to do so.
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
 
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
-FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
-COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
-IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
-CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
 
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/speech-to-speech/README.md b/speech-to-speech/README.md
@@ -43,6 +43,10 @@ The following projects were developed by AWS teams and showcase examples of how
     This serverless implementation provides a lightweight, easily deployable, and scalable Nova Sonic infrastructure using AWS Lambda and AppSync Events, offering a streamlined approach to real-time speech-to-speech communication. It features serverless real-time communication between server and client using AppSync Events, reference to past conversation history, tool use implementation, automatic resume for conversations exceeding 8 minutes, and an extensible web UI built with Next.js.
 
 
+- [Pipecat Voice AI Agent - Production AWS Deployment](sample-codes/pipecat-voice-agent/)
+
+    A comprehensive production-ready deployment of the Pipecat Voice AI Agent featuring dual-channel voice interactions through both Twilio phone calls and WebRTC browser chat. This sample demonstrates AWS Nova Sonic integration with complete infrastructure as code using AWS CDK, supporting both ECS and EKS deployment options. It includes SSL certificate management for Twilio webhooks, auto-scaling, monitoring, security best practices, and comprehensive documentation for production deployments.
+
 - [Sonic Playground for Experimenting](https://github.com/aws-samples/sample-sonic-java-playground)
 
     This solution serves as an experimental playground for developers to test and optimize Nova Sonic capabilities by configuring various model parameters and finding the optimal settings for their specific use cases. The application supports creating new conversation sessions with voice IDs for language selection, TopP, Temperature, MaxTokens for response length control, and system prompts. Built with Java Spring Boot and React, it provides a reference implementation for speech-to-speech applications.
diff --git a/speech-to-speech/sample-codes/pipecat-voice-agent/.env.example b/speech-to-speech/sample-codes/pipecat-voice-agent/.env.example
@@ -0,0 +1,25 @@
+# Daily.co API Configuration
+DAILY_API_KEY=your_daily_api_key_here
+DAILY_API_URL=https://api.daily.co/v1
+
+# AWS Configuration
+AWS_ACCESS_KEY_ID=your_aws_access_key_here
+AWS_SECRET_ACCESS_KEY=your_aws_secret_key_here
+AWS_REGION=us-east-1
+
+# Twilio Configuration (for phone service)
+TWILIO_RECOVERY_CODE=your_twilio_recovery_code
+TWILIO_ACCOUNT_SID=your_twilio_account_sid
+TWILIO_AUTH_TOKEN=your_twilio_auth_token
+TWILIO_PHONE_NUMBER=+1234567890
+TWILIO_SID=your_twilio_sid
+TWILIO_SECRET=your_twilio_secret
+TWILIO_AUTH_LIVE=your_twilio_auth_live
+
+# Optional Configuration
+ENVIRONMENT=development
+LOG_LEVEL=INFO
+HOST=0.0.0.0
+FAST_API_PORT=7860
+MAX_BOTS_PER_ROOM=1
+MAX_CONCURRENT_ROOMS=10
diff --git a/speech-to-speech/sample-codes/pipecat-voice-agent/README.md b/speech-to-speech/sample-codes/pipecat-voice-agent/README.md
@@ -0,0 +1,164 @@
+# Pipecat Voice AI Agent - AWS Cloud Deployment
+
+A production-ready containerized deployment of the Pipecat Voice AI Agent on AWS, featuring both WebRTC and Twilio phone integration with AWS Nova Sonic for natural voice conversations. Supports both ECS and EKS deployment options.
+
+## Overview
+
+This sample demonstrates how to deploy a voice AI agent using:
+- **AWS Nova Sonic** for speech-to-text and text-to-speech
+- **Pipecat framework** for voice AI conversations
+- **Twilio** for phone call integration
+- **Daily.co** for WebRTC browser-based voice chat
+- **AWS ECS/EKS** for scalable container deployment
+- **AWS CDK** for infrastructure as code
+
+## Architecture
+
+The solution provides two deployment options:
+- **ECS**: Managed container orchestration with Fargate
+- **EKS**: Kubernetes-native deployment with Fargate
+
+Both support:
+- Phone calls via Twilio WebSocket integration
+- Browser voice chat via WebRTC
+- AWS Nova Sonic for natural voice processing
+- Production-ready monitoring and scaling
+
+## Prerequisites
+
+- AWS CLI configured with appropriate permissions
+- Node.js 18+ and npm
+- Docker
+- Python 3.10+
+- AWS CDK CLI (`npm install -g aws-cdk`)
+
+## Quick Start
+
+1. **Clone and setup**:
+```bash
+git clone <repository-url>
+cd speech-to-speech/sample-codes/pipecat-voice-agent
+./setup-project.sh
+```
+
+2. **Configure secrets**:
+```bash
+cp .env.example .env
+# Edit .env with your API keys
+python3 scripts/setup-secrets.py
+```
+
+3. **Deploy infrastructure** (choose ECS or EKS):
+
+**ECS Deployment:**
+```bash
+cd infrastructure
+./deploy.sh --environment test --region us-east-1
+```
+
+**EKS Deployment:**
+```bash
+cd infrastructure/-eks
+cdk deploy PipecatEksStack --parameters environment=test
+```
+
+4. **Build and deploy application**:
+```bash
+./scripts/build-and-push.sh -e test -t latest
+./scripts/deploy-service.sh -e test -t latest
+```
+
+## Key Features
+
+### Voice AI Capabilities
+- **Natural Conversations**: AWS Nova Sonic provides human-like speech synthesis
+- **Real-time Processing**: Low-latency speech-to-text and text-to-speech
+- **Multi-channel Support**: Both phone calls and web browser voice chat
+- **Function Calling**: Example weather function with extensible architecture
+
+### Production Infrastructure
+- **Auto-scaling**: ECS/EKS services scale based on demand
+- **High Availability**: Multi-AZ deployment with load balancing
+- **Security**: AWS Secrets Manager, IAM roles, VPC isolation
+- **Monitoring**: CloudWatch logs, metrics, and health checks
+- **SSL/TLS**: Automatic certificate management for Twilio webhooks
+
+### Twilio Integration
+- **Phone Number Support**: Inbound calls to your Twilio number
+- **WebSocket Streaming**: Real-time bidirectional audio
+- **SSL Certificate Requirements**: Production-ready HTTPS endpoints
+- **Call Management**: Active call monitoring and session handling
+
+## Environment Variables
+
+Required configuration (stored in AWS Secrets Manager):
+
+```bash
+# Daily.co WebRTC
+DAILY_API_KEY=your_daily_api_key
+
+# AWS Configuration  
+AWS_REGION=us-east-1
+AWS_ACCESS_KEY_ID=your_access_key
+AWS_SECRET_ACCESS_KEY=your_secret_key
+
+# Twilio Phone Integration
+TWILIO_ACCOUNT_SID=your_account_sid
+TWILIO_AUTH_TOKEN=your_auth_token
+TWILIO_PHONE_NUMBER=+1234567890
+```
+
+## Testing Your Deployment
+
+### WebRTC Voice Chat
+1. Visit your load balancer URL
+2. Click "Connect" to join a voice room
+3. Speak to interact with the AI agent
+
+### Phone Integration
+1. Configure Twilio webhook to point to your deployment
+2. Call your Twilio phone number
+3. Have a voice conversation with the AI
+
+## Important: Twilio SSL Requirements
+
+For production Twilio integration:
+- **Valid SSL Certificate**: Must be from a trusted CA (Let's Encrypt, etc.)
+- **No Self-Signed Certificates**: Twilio rejects untrusted certificates
+- **HTTPS Required**: Use standard port 443
+- **Load Balancer SSL**: AWS automatically handles certificate management
+
+## Documentation
+
+- [EKS Architecture Overview](docs/EKS_ARCHITECTURE.md)
+- [Deployment Guide](infrastructure/DEPLOYMENT_GUIDE.md)
+- [Cleanup Guide](docs/CLEANUP_GUIDE.md)
+- [Troubleshooting Guide](docs/TROUBLESHOOTING_GUIDE.md)
+
+## Cost Considerations
+
+- **Fargate**: Pay only for running containers
+- **Nova Sonic**: Usage-based pricing for speech processing
+- **Load Balancers**: Fixed hourly cost plus data transfer
+- **Twilio**: Per-minute charges for phone calls
+
+## Security Best Practices
+
+- Secrets stored in AWS Secrets Manager
+- IAM roles with least-privilege access
+- VPC isolation with security groups
+- Container runs as non-root user
+- TLS encryption for all external communication
+
+## Contributing
+
+This sample follows AWS best practices for:
+- Infrastructure as Code (CDK)
+- Container security
+- Monitoring and observability
+- Cost optimization
+- Multi-AZ high availability
+
+## License
+
+This sample code is made available under the MIT-0 license. See the LICENSE file.
diff --git a/speech-to-speech/sample-codes/pipecat-voice-agent/aws/README.md b/speech-to-speech/sample-codes/pipecat-voice-agent/aws/README.md
@@ -0,0 +1,51 @@
+# AWS Configuration
+
+This directory contains AWS-specific configuration files for the Pipecat ECS deployment.
+
+## Structure
+
+### Policies (`policies/`)
+
+- `ecs-task-execution-role-trust-policy.json` - Trust policy for ECS task execution role
+- `execution-role-secrets-policy.json` - Policy for accessing AWS Secrets Manager
+- `pipecat-task-policy.json` - Task-specific permissions policy
+
+### Task Definitions (`task-definitions/`)
+
+- `phone-task-definition.json` - ECS task definition for the phone service
+
+## Usage
+
+These files are typically used by:
+
+- AWS CDK infrastructure deployment (in `infrastructure/` directory)
+- Manual AWS CLI commands for policy and role creation
+- ECS service deployment scripts
+
+## Policy Overview
+
+### ECS Task Execution Role
+
+Allows ECS to pull images from ECR and write logs to CloudWatch.
+
+### Secrets Access Policy
+
+Grants access to specific secrets in AWS Secrets Manager for:
+
+- Daily.co API keys
+- Twilio credentials
+- Other application secrets
+
+### Task Policy
+
+Application-level permissions for:
+
+- AWS Bedrock access
+- CloudWatch logging
+- Other AWS services used by the application
+
+## Notes
+
+- These policies follow the principle of least privilege
+- Secrets are injected as environment variables by ECS
+- All configurations are designed for production security standards
diff --git a/...h/sample-codes/pipecat-voice-agent/aws/policies/ecs-task-execution-role-trust-policy.json b/...h/sample-codes/pipecat-voice-agent/aws/policies/ecs-task-execution-role-trust-policy.json
@@ -0,0 +1,12 @@
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Effect": "Allow",
+      "Principal": {
+        "Service": "ecs-tasks.amazonaws.com"
+      },
+      "Action": "sts:AssumeRole"
+    }
+  ]
+}
diff --git a/...o-speech/sample-codes/pipecat-voice-agent/aws/policies/execution-role-secrets-policy.json b/...o-speech/sample-codes/pipecat-voice-agent/aws/policies/execution-role-secrets-policy.json
@@ -0,0 +1,14 @@
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Effect": "Allow",
+      "Action": [
+        "secretsmanager:GetSecretValue"
+      ],
+      "Resource": [
+        "arn:aws:secretsmanager:eu-north-1:094271239310:secret:pipecat/*"
+      ]
+    }
+  ]
+}
diff --git a/speech-to-speech/sample-codes/pipecat-voice-agent/aws/policies/pipecat-task-policy.json b/speech-to-speech/sample-codes/pipecat-voice-agent/aws/policies/pipecat-task-policy.json
@@ -0,0 +1,22 @@
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Effect": "Allow",
+      "Action": [
+        "bedrock:InvokeModel",
+        "bedrock:InvokeModelWithResponseStream"
+      ],
+      "Resource": "*"
+    },
+    {
+      "Effect": "Allow",
+      "Action": [
+        "secretsmanager:GetSecretValue"
+      ],
+      "Resource": [
+        "arn:aws:secretsmanager:eu-north-1:094271239310:secret:pipecat/*"
+      ]
+    }
+  ]
+}