Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 17 additions & 13 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
MIT No Attribution
MIT License

Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copyright (c) 2025 Pipecat Voice AI Agent AWS Deployment

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
4 changes: 4 additions & 0 deletions speech-to-speech/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ The following projects were developed by AWS teams and showcase examples of how
This serverless implementation provides a lightweight, easily deployable, and scalable Nova Sonic infrastructure using AWS Lambda and AppSync Events, offering a streamlined approach to real-time speech-to-speech communication. It features serverless real-time communication between server and client using AppSync Events, reference to past conversation history, tool use implementation, automatic resume for conversations exceeding 8 minutes, and an extensible web UI built with Next.js.


- [Pipecat Voice AI Agent - Production AWS Deployment](sample-codes/pipecat-voice-agent/)

A comprehensive production-ready deployment of the Pipecat Voice AI Agent featuring dual-channel voice interactions through both Twilio phone calls and WebRTC browser chat. This sample demonstrates AWS Nova Sonic integration with complete infrastructure as code using AWS CDK, supporting both ECS and EKS deployment options. It includes SSL certificate management for Twilio webhooks, auto-scaling, monitoring, security best practices, and comprehensive documentation for production deployments.

- [Sonic Playground for Experimenting](https://github.com/aws-samples/sample-sonic-java-playground)

This solution serves as an experimental playground for developers to test and optimize Nova Sonic capabilities by configuring various model parameters and finding the optimal settings for their specific use cases. The application supports creating new conversation sessions with voice IDs for language selection, TopP, Temperature, MaxTokens for response length control, and system prompts. Built with Java Spring Boot and React, it provides a reference implementation for speech-to-speech applications.
25 changes: 25 additions & 0 deletions speech-to-speech/sample-codes/pipecat-voice-agent/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Daily.co API Configuration
DAILY_API_KEY=your_daily_api_key_here
DAILY_API_URL=https://api.daily.co/v1

# AWS Configuration
AWS_ACCESS_KEY_ID=your_aws_access_key_here
AWS_SECRET_ACCESS_KEY=your_aws_secret_key_here
AWS_REGION=us-east-1

# Twilio Configuration (for phone service)
TWILIO_RECOVERY_CODE=your_twilio_recovery_code
TWILIO_ACCOUNT_SID=your_twilio_account_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
TWILIO_PHONE_NUMBER=+1234567890
TWILIO_SID=your_twilio_sid
TWILIO_SECRET=your_twilio_secret
TWILIO_AUTH_LIVE=your_twilio_auth_live

# Optional Configuration
ENVIRONMENT=development
LOG_LEVEL=INFO
HOST=0.0.0.0
FAST_API_PORT=7860
MAX_BOTS_PER_ROOM=1
MAX_CONCURRENT_ROOMS=10
164 changes: 164 additions & 0 deletions speech-to-speech/sample-codes/pipecat-voice-agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Pipecat Voice AI Agent - AWS Cloud Deployment

A production-ready containerized deployment of the Pipecat Voice AI Agent on AWS, featuring both WebRTC and Twilio phone integration with AWS Nova Sonic for natural voice conversations. Supports both ECS and EKS deployment options.

## Overview

This sample demonstrates how to deploy a voice AI agent using:
- **AWS Nova Sonic** for speech-to-text and text-to-speech
- **Pipecat framework** for voice AI conversations
- **Twilio** for phone call integration
- **Daily.co** for WebRTC browser-based voice chat
- **AWS ECS/EKS** for scalable container deployment
- **AWS CDK** for infrastructure as code

## Architecture

The solution provides two deployment options:
- **ECS**: Managed container orchestration with Fargate
- **EKS**: Kubernetes-native deployment with Fargate

Both support:
- Phone calls via Twilio WebSocket integration
- Browser voice chat via WebRTC
- AWS Nova Sonic for natural voice processing
- Production-ready monitoring and scaling

## Prerequisites

- AWS CLI configured with appropriate permissions
- Node.js 18+ and npm
- Docker
- Python 3.10+
- AWS CDK CLI (`npm install -g aws-cdk`)

## Quick Start

1. **Clone and setup**:
```bash
git clone <repository-url>
cd speech-to-speech/sample-codes/pipecat-voice-agent
./setup-project.sh
```

2. **Configure secrets**:
```bash
cp .env.example .env
# Edit .env with your API keys
python3 scripts/setup-secrets.py
```

3. **Deploy infrastructure** (choose ECS or EKS):

**ECS Deployment:**
```bash
cd infrastructure
./deploy.sh --environment test --region us-east-1
```

**EKS Deployment:**
```bash
cd infrastructure/-eks
cdk deploy PipecatEksStack --parameters environment=test
```

4. **Build and deploy application**:
```bash
./scripts/build-and-push.sh -e test -t latest
./scripts/deploy-service.sh -e test -t latest
```

## Key Features

### Voice AI Capabilities
- **Natural Conversations**: AWS Nova Sonic provides human-like speech synthesis
- **Real-time Processing**: Low-latency speech-to-text and text-to-speech
- **Multi-channel Support**: Both phone calls and web browser voice chat
- **Function Calling**: Example weather function with extensible architecture

### Production Infrastructure
- **Auto-scaling**: ECS/EKS services scale based on demand
- **High Availability**: Multi-AZ deployment with load balancing
- **Security**: AWS Secrets Manager, IAM roles, VPC isolation
- **Monitoring**: CloudWatch logs, metrics, and health checks
- **SSL/TLS**: Automatic certificate management for Twilio webhooks

### Twilio Integration
- **Phone Number Support**: Inbound calls to your Twilio number
- **WebSocket Streaming**: Real-time bidirectional audio
- **SSL Certificate Requirements**: Production-ready HTTPS endpoints
- **Call Management**: Active call monitoring and session handling

## Environment Variables

Required configuration (stored in AWS Secrets Manager):

```bash
# Daily.co WebRTC
DAILY_API_KEY=your_daily_api_key

# AWS Configuration
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key

# Twilio Phone Integration
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890
```

## Testing Your Deployment

### WebRTC Voice Chat
1. Visit your load balancer URL
2. Click "Connect" to join a voice room
3. Speak to interact with the AI agent

### Phone Integration
1. Configure Twilio webhook to point to your deployment
2. Call your Twilio phone number
3. Have a voice conversation with the AI

## Important: Twilio SSL Requirements

For production Twilio integration:
- **Valid SSL Certificate**: Must be from a trusted CA (Let's Encrypt, etc.)
- **No Self-Signed Certificates**: Twilio rejects untrusted certificates
- **HTTPS Required**: Use standard port 443
- **Load Balancer SSL**: AWS automatically handles certificate management

## Documentation

- [EKS Architecture Overview](docs/EKS_ARCHITECTURE.md)
- [Deployment Guide](infrastructure/DEPLOYMENT_GUIDE.md)
- [Cleanup Guide](docs/CLEANUP_GUIDE.md)
- [Troubleshooting Guide](docs/TROUBLESHOOTING_GUIDE.md)

## Cost Considerations

- **Fargate**: Pay only for running containers
- **Nova Sonic**: Usage-based pricing for speech processing
- **Load Balancers**: Fixed hourly cost plus data transfer
- **Twilio**: Per-minute charges for phone calls

## Security Best Practices

- Secrets stored in AWS Secrets Manager
- IAM roles with least-privilege access
- VPC isolation with security groups
- Container runs as non-root user
- TLS encryption for all external communication

## Contributing

This sample follows AWS best practices for:
- Infrastructure as Code (CDK)
- Container security
- Monitoring and observability
- Cost optimization
- Multi-AZ high availability

## License

This sample code is made available under the MIT-0 license. See the LICENSE file.
51 changes: 51 additions & 0 deletions speech-to-speech/sample-codes/pipecat-voice-agent/aws/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# AWS Configuration

This directory contains AWS-specific configuration files for the Pipecat ECS deployment.

## Structure

### Policies (`policies/`)

- `ecs-task-execution-role-trust-policy.json` - Trust policy for ECS task execution role
- `execution-role-secrets-policy.json` - Policy for accessing AWS Secrets Manager
- `pipecat-task-policy.json` - Task-specific permissions policy

### Task Definitions (`task-definitions/`)

- `phone-task-definition.json` - ECS task definition for the phone service

## Usage

These files are typically used by:

- AWS CDK infrastructure deployment (in `infrastructure/` directory)
- Manual AWS CLI commands for policy and role creation
- ECS service deployment scripts

## Policy Overview

### ECS Task Execution Role

Allows ECS to pull images from ECR and write logs to CloudWatch.

### Secrets Access Policy

Grants access to specific secrets in AWS Secrets Manager for:

- Daily.co API keys
- Twilio credentials
- Other application secrets

### Task Policy

Application-level permissions for:

- AWS Bedrock access
- CloudWatch logging
- Other AWS services used by the application

## Notes

- These policies follow the principle of least privilege
- Secrets are injected as environment variables by ECS
- All configurations are designed for production security standards
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:eu-north-1:094271239310:secret:pipecat/*"
]
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:eu-north-1:094271239310:secret:pipecat/*"
]
}
]
}
Loading